...
However, this decision creates a number of problems.
First, to ptotect users' privacy, we need While data is redacted, we believe that publishign individual trails represents an unacceptable breach of privacy. Our intention is to protect users' privacy by pooling the data with that from other patients, and using this “crowd” to generate privacy. To this end, we are planning to determine some minimum number of user records that can be published at a time.
...
Because the data can trivially be scraped and stored by 3rd parties, who can could therefore determine the entire version history of the JSON data, this represents the minimal number of cases that can be added in a single increementincrement.
We don’t know what this number is yet, but let’s suppose it’s 10. That means you have can only publish case data 10 cases at a time.
Until you have collected 10 cases, you just have to wait. In low-volume deployments (which all our initial targets are, this is a major problem).
...
In HaitHaiti, for example, the number of new cases is often between 1 and 10 - in fact it’s often fewer than 5 . (unfortuanely unfortunately the rates are going up in the last week).
It’s unrealistic to expact 100% of these to agree to publish their data - that figure could easily be 50% or lower.
This means that it will be common for case data from individuals to have a publication delay of 2 or 3 days, perhaps even longer.
...
This chart from Oxford (https://science.sciencemag.org/content/368/6491/eabb6936) shows why this is such a bigf big problem.
This shows transmission from day of infection.
...
For patients notified by the app, we have a bit more of a head start. However I don’t believe we can go on & notify their contacts until they either have symptoms or a possible test.
Having notified at day 7 of the index patient, these contacts are (on average) already 2d past infection (some may be much further advanced). Once they’d been tested, they are at day 33d. They will already have passed on the infection to a small, but significant 3rd Tier. If we add another 2-3d delay before notifying (and therefore quarantinging) the 3rd Tier, we are massively undermining the efficiency of the solution.
WIth every Tier we add this 2-3d delay, and miss a huge opportunity to contain the virus on each occasion.
I haven’t done detailed modelling of the impact of a 2-3d delay in notifications, but this modelling from Oxford shows the impact of 0 to 3 days delay (0d on the right, 3d on the left) on the % of transmissions that have to be identified, and the % of success with quarantine, to get R below 1.0.
...
We’ll do slightly better for non-index patients, who get tested based on an App notification, rather than symptoms - but the impact of a 2-3d delay is still huge.It is true that as the virus spreads in an area, the case volume will go up, and we’ll get to the point where we enough cases/day to address our privacy concerns, and therefore be able to publish every 12 or 24 hours.
Personally I am not at all comfortable with a solution that only works if things get worse before they get better - and further that won’t be able to actually hel eradicate the disease, because it is ineffective with small numbers of cases.
In short - for the solution to be effective, we need to be able to publish individual users data promptly (within 12 hours; sooner would actually be significantly better). We can’t do that if we are dependent on “crowd” effects for privacy, because the crowds simply will not be there in many of our small scale deployments.
Further, in Note also, my assuption that a crowd of 10 will be enough for privacy purposes may be a major underestimate. In some jurisdictions, it is not even clear that there is any value N for which the pooled data on N users can be considered to result in adequate privacy. Adam Leon Smith (Unlicensed) tells me that this would be the case in the EU (though that may chage if we got a more explicit consent from users).
And beyond what the law says, I would not be surprised if there were significant public outcry when it is discovered that we publish this personal data in plain text. The only defense I can see against that is total transparency with patients about how their data will be published, and how exposed this could make them - and I suspect that such transparency will massively hinder uptake..