...
This concerned me because we have a huge amount of product function dedicated to Exposure Notifications: Safe Places Redaction & Publishing, Health Authority config in Safe Paths, and the Exposure Notification processing itself. Betwene Betweeb them these add up to about 50% of the product. Is it really not needed?
...
The notification box(es) for the cinema or theatre theater show might be a larger area (covering the whole cinema/theater lobby), but specifically targeting 10 minute windows at the start and end time of the show, with a goal to notify anyone who is spotted in that area, in that time window (even if they only match on a single GPS point). Even more sophisticated: we could also match on the fact that their phone goes offline for the duration of the show.
...
By dramatically reducing the amount of personal data we share, moving from a “redacted trail” model to a “zones of concern” model, the personal nature of the data is massively reduced, and therefore the privacy issues are reduced.
But I don’t think the privacy risks are eliminated - we need to think not only about the privacy of the infected patient, but also the privacy of businesses, and other individuals known to frequent affected locations. So I don’t think the encryption requirement goes away.
Potentially we need to serve up a much more semantically rich set of descriptions of “zones of concern” - not just space/time boxes like the restaurant example, but more sophisticated examples like the cinema/theatre theater and bus example.
Our current encryption proposals assume that the data served by the HA is a homegenous set of place/time data points. A different approach may be needed for these more sophisticated examples.
It’s not clear to me how a user’s space-time points can be assessed against a rich description of a “zone of concern”, without either the space-time point being disclosed to a server (server-side comparison), or the description of the “zone of concern” being disclosed to the App (client-side comparison). The problem being that a cryptographic hash function will not preserve any of the topology of the space-time region being hashed.
A possible solution would be for the “zone of concern” to be resolved into a discrete set of space-time points, which could be served to the client in a hashed form, which the client could compare this with their own hashes of their location data. These hashed space-time points could retain a “criticality” value without any obvious loss of privacy. Matching based on speed/bearing as well gets complicated, though!
Summing Up
The key points I want to pull out from the above are:
We should move away from a "redacted trail" model, to a "zones of concern" model. Just as effective & much more privacy protective.
"zones of concern" do not need to be comprised of location/time pairs from the patient's original trail. They could be made up of newly synthesized location/time pairs to better match the matching needs of the HA for a particular environment.
Location/time pairs can & should have a "criticality" associated with them.
Negative criticality may be a useful concept (e.g. in the theater/cinema case, anyone who had their phone on in the middle of the show at this location, was probably not watching the show).
It would be nice to have a much richer model for describing "zones of concern" (e.g. vector-based rather than point-based, and factoring in speed/bearing) to help with cases like bus journeys, . But I can't see any way to do that in a manner that would enable encryption, and I am doubtful we are going to be able to invent such techniques quickly.
Plan for MVP1
If we agree on all of the above as our correct overall direction, what’s the bare minimum we need to do for MVP1
Ensure that “Redaction” guidelines are up-to-date to ensure that all data that is likely to be ineffective for exposure detection (e.g. walking on the street outside) is redacted.
Consider renaming “Redaction” to shift emphasis from privacy towards efficacy. This will include in privacy language used towards users.
All data points are stored as one-way hashes - see Hashing details below, to +/-76m accuracy. Whether or not to include salt in MVP1 is TBC.
Update Safe Paths App to match based on hashed geohashes of:
The recorded GPS point
Points 20m to the N, NE, E, SE, S, SW, W & NW (if these generate different Geohashes)
Update Safe Paths App to log a minimum number of points of concern before generating a notification (suggested default: > 66% of points over a 30 min period)
These parameters to be specfied by the HA in their HA JSON file (as a global setting), with guidance provided on what we believe are suitable settings.
Reduce default exposure time for a point of concern from (0 mins to 4 hours) to (-5 mins to +5 mins). This reflects the fact that we believe that trying to capture fomite transmission will yield too many false positives, so we are only focussing on person-to-person transmission.
Hashing Details
Published data points should be geohashes (less-precise than specific GPS points), and stored as a SHA-256 hash of (geohash, time-bin) (where time-bin is a 5 minute rounded-down time interval in UTC).
Geohash accuracy (this is at the equator, slightly more accurate further from the equator)
Number of digits | m accuracy |
---|---|
6 | +/-610 |
7 | +/-76 |
8 | +/-19 |
9 | +/-2.4 |
https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-a-geohash
For additional security a salt can be added to the hash, Ideally this is:
Specific to a single HA
Changes daily & is not pre-announced
Can be published by the HA alongside the points of concern
Future Phases - all beyond MVP1
If we deliver MVP1 as above, what would future phases look like? (we can also conaiser whether any of these is so important it should be in MVP1
Variable geohash blurring depending on geography (urban vs. rural) and number of points of concern.
Add a “criticality” value to a point of concern, to allow the contribution a given point of concern makes towards hitting the threshold for notification to be different from the default value.
Add a “time-window” value to a point of concern: to allow the time-window that counts for an overlap to be different from the default value.
Add basic tools to Safe Places to allow “criticality” and “time window” to be set on individual data points.
Add targeted tools to Safe Places to replace user-provided data points with synthetic data points that are optimal for generating user matches, for example:
(e.g.) A “Bus” tool, which traces a bus route with a much finer set of data points, each with a very low “time window”
(e.g.) A “Cinema/Theater” tool, which sets high-criticality points of concern at the start and end times of a given show, and sets negative-criticality points of concern during the middle of the show.