Hashing of GPS points - Design
Recording GPS data in the Safe Paths App
When the app records GPS data, it records the following in the local secure database
lat / long / timestamp, as today
A series of hashes computed as follows:
For this time window + 1 “nearby” time window…
For this geohash + N “nerby” geohashes (there may be 0, 1 or many of these - see below)…
A scrypt hash computed from the (geohash, time window) pair, as per details below.
In total, if there are N “nearby” geohashes, a total of 2 x (N + 1) hashes will be stored. These can be stored in an unordered list (i.e. there is no need to store any info on which is which).
Time windows
The “time window” is a UTC timestamp in seconds, rounded down to the nearest 5 minutes.
E.g. Time: 14 April 2020, 12:03:12pm (which rounds down to 12:00:00pm), which maps to 1586865600000 (msecs)
The “nearby” time window is defined as follows:
If the timestamp is in the 1st half of the time window (e.g. 12:01pm) it is the preceding time window (begins 11:55am)
If the timestamp is in the 2nd half of the time window (e.g. 12:03pm), it is the following time window (begins 12:05pm)
The exact midpoint of the 5 min window (2:30) is treated as being in the 2nd half.
Nearby Geohashes
We determine N “nearby” geohashes as follows:
If the point is latitude X, longtiude Y, we compute geohashes for the following points:
N: X, Y+1e-04
NE: X+7e-05, Y+7e-05
E: X+1e-04, Y
SE: X+7e-05, Y-7e-05
S: X, Y-1e-04
SW: X-7e-05, Y-7e-05
W: X-1e-04, Y
NW: X-7e-05, Y+7e-05
This is approximately a circle of radius 10m, using an approximate that 1e-05 degrees is 1m. In fact it is 1.1m of longitude and 1.1m to 0.7m of latitude (from the equator to 50 degrees north). Given the overall crudeness of geohash accuracy, and inaccuracy inherent in our GPS position, we consider this “good enough” for MVP1.
Beyond 50 degrees north calculations get a bit more complicated, which we’ll worry about beyind MVP1).
This picture (not necessarily to scale) shows the geohashes that would be considered “nearby” to the central geohash (green), based on the blue GPS position dot.
The Hash Calculation
We use a scrypt hash with the following parameters:
A “cost” (N) that is to be determined. For initial implemention we use 2^12 = 4096. For improved security we expect to move to 2^16,. 2^17 or 2^18 for production, but the exact value still needs to be determined.
A block size (r) of 8 - this is the default.
Parallelization (p) of 1 - this is the default.
A keylen (output) of 8 bytes = 16 hex digits.
As a salt, we use the 4 charaters “salt” (lower case) - or the ASCII values of these letters for scrypt implementations that require a byte array.
For the text to be encoded, we use
The 8 char geohash
Followed by the UTC timestamp of the beginning of the relevant 5 minute time window, in milliseconds
Example:
Latitude: 51.5019, Longitude: -0.1415
Time: 14 April 2020, 12:03:12pm (which rounds down to 12:00:00pm)
Gives us:
8 character geohash: gcpuuz8u
UTC timestamp: 158686560000
Data string to encode: “gcpuuz8u1586865600000” which should hash to “c9414c55812d796a“ (cost = 12)
Sending data to Safe Places for Redaction Processing
To keep implementation costs down for MVP1, there is no change to this interface.
Therefore, we continue to send the JSON data exactly as todaY:
{"longitude": 14.91328448, "latitude": 41.24060321, "time": 1589117739000 }
In future we may make an optimization where:
Safe Paths sends the already-computed hash to Safe Places as an additional file in the JSON location data shared
Safe Places stores this off & uses this hash, where available, when generating JSON “points of concern” data.
For MVP1, the value of this optimization is low, as the expected rollout of the app will be low. Therefore we save implementation costs now, and will add this later.
Note that none of the “nearby” geohash or time window calculations performed by the App are shared with Safe Places - just the originally recorded GPS data point & timestamp.
Safe Places Publishing of Data
When publishing points of concern JSON data, there are two key changes:
Points of concern
Notification sensitivity controls
Points of concern
The current lat/long/timestamp format is replaced by just a hash.
So this:
"concern_points": [{
"time": 1589224427000,
"longitude": 1.00000919,
"latitude": 2.00000943
}]
becomes this:
"concern_point_hashes":[“87e916850d4def3c”]
and an example with multiple hashes looks like this:
"concern_point_hashes": ["e090295d8e58b7de", "92c0e4f2d757c2a8", "755d22f05c13397a"]
The value of the hash is computed in exactly the same was as on Safe Patths (see above). Note that changing the cost value will result in changes to the hash values produced, so as we tune the cost value to the optimal value, this will require careful co-ordination between Safe Places and Safe Paths.
(potentially for compatibility, Safe Places could output multiple hash values, with different cost values, for each data point - Safe Paths will simply fail to match on any hashes computed with the wrong hash value - while collisions are theoretically possible in practise the odd are almost zero).
For compatibility reasons as we work through this change, our plan is to add the hash first, and then remove the other points later - so for an interim period during MVP1 development, we may output data like this:
This will allow both old & new versions of the app to work with Safe Places. But we aim to move to the final format before we launch MVP1.
Notification of sensitivity controls
Two new top-level fields are added at the top of the HA JSON file.
These two parameters work together to control the sensitivity of exposure notifications for this data. The Health Authority can turn these based on their experience and feedback from their community.
With the defaults, we trigger an exposure notification if we get more than 66% of location matches, across any 30 minute period (i.e. if we match 4 or more of 6 points).
Safe Paths logic for matching points of concern
On receipt of the “points of concern” data from the Health Authority, Safe Paths:
Stores the notification sensitivity controls for use during matching
Stores all the reported hash values in a data structure that allows for fast lookups by hash value (e.g. a tree).
Safe Paths then looks for matches on points of concern as follows:
It iterates through each data point stored in its local database.
For each of the 2 x (N+1) hashes stored with that data point, it looks for a match in the HA’s declared “points of concern”.
If it identifies a match, then it marks that data point as having matched a point of concern for that HA. (it must record the specific HA so that it can apply the correct notiifcation sensitivity cotrols).
Having performed this matching on all local data points, it then searches for notifiable events within the local data set.
For each data point that matches an HA’s “point of concern”, it inspects the next K local data points in the local database (where K is the supplied “notification threshold count”). If more than M% of these matched “points of concern” with the same Health Authority (where M is the “notification threshold percent”), then all K data points are classified as part of the exposure notification for that day.
To determine the duration of the exposure notification for a given day, this calculation is performed for all data points in the day - with all data points that are classified as part of an exposure contributing 5 minutes to the total exposure reported for the day.
There can only be one exposure notification in a day, and the miniumum exposure perios is (5 mins x “notification threshold count”)
See more details here: Exposure Notifications more detailed design
Related Work
In MVP1, under SAF-397 / MVP1#17, we will be doing work to reduce the volume of data downloaded by the Mobile App from Same Places.
Specifically under SAF-275 / PLACES-83, we plan to break the HA JSON file into several pieces, so that each 12 hour publication event results in a new file, rather than additions to an existing file.
Safe Paths will download one or more of these files - typically just one, but in the case that Safe Paths was offline for an extended period it could be more than one..
The Safe Places publishing activity & the Safe Paths handling of points of concern will need to accommodate these changes.
This is expected to be handled by SAF-275 and PLACES-83, but with both developments going on in parallel, there is a risk of conflicts, and so the teams should co-ordinate their designs & activity together.