Recording GPS data in the Safe Paths App

When the app records GPS data, it records the following in the local secure database

In total, if there are N “nearby” geohashes, a total of 2 x (N + 1) hashes will be stored. These can be stored in an unorded list (no need to store any info on which is which).

Time windows

The “time window” is a UTC timestamp in seconds, rounded down to the nearesrt 5 minutes.

The “nearby” time window is defined as follows:

The exact midpoint of the 5 min window (2:30) is ttreated as being in the 2nd half.

Nearby Geohashes

We determine N “nearby” geohashes as follows:

This is approximately a circle of radius 10m, using an approximate that 1e-05 degrees is 1m. In fact it is 1.1m of longitude and 1.1m to 0.7m of latitude (from the equator to 50 degrees north). Given the overall crudeness of geohash accuracy, and inaccuracy inherent in our GPS position, we consider this “good enough” for MVP1.

Beyond 50 degrees north calculations get a bit more complicated, which we’ll worry about beyind MVP1).

This picture (not necessarily to scale) shows the geohashes that would be considered “nearby” to the central geohash (green), based on the blue GPS position dot.

The Hash Calculation

We use a scrypt hash with the following parameters:

As a salt, we use the 4 charaters “salt” (lower case) - or the ASCII values of these letters for scrypt implementations that require a byte array.

For the text to be encoded, we use

Example:

Gives us:

Data string to encode: “gcpuuz8u1586865600”

Sending data to Safe Places for Redaction Processing

To keep implementation costs down for MVP1, there is no change to this interface.

Therefore, we continue to send the JSON data exactly as todaY:

{"longitude": 14.91328448, "latitude": 41.24060321,   "time": 1589117739000 }

In future we may make an optimization where:

For MVP1, the value of this optimization is low, as the expected rollout of the app will be low. Therefore we save implementation costs now, and will add this later.

Note that none of the “nearby” geohash or time window calculations performed by the App are shared with Safe Places - just the originally recorded GPS data point & timestamp.

Safe Places Publishing of Data

When publishing points of concern JSON data, there are two key changes:

Points of concern

The current lat/long/timestamp format is replaced by just a hash.

So this:

"concern_points":[   {       "time":1589224427000,      "longitude":1.00000919,      "latitude":2.00000943     } ]

beocmes this:

"concern_points":[   {       "hash":“87e916850d4def3c”     } ]

For compatibility reasons as we work through this change, our plan is to add the hash first, and then remove the other points later - so for an interim period during MVP1 development, we may output data like this:

"concern_points":[   {       "time":1589224427000,      "longitude":1.00000919,      "latitude":2.00000943, "hash":“87e916850d4def3c”     } ]

This will allow both old & new versions of the app to work with Safe Places. But we aim to move to the final format before we launch MVP1.

Notification of sensitivity controls

Two new top-level fields are added at the top of the HA JSON file.

"notification_threshold_percent":66,
“notification_threshold_count”:6

These two parameters work together to control the sensitivity of exposure notifications for this data. The Health Authority can turn these based on their experience and feedback from their community.

With the defaults, we trigger an exposure notification if we get more than 66% of location matches, across any 6 consecutive location data points (i.e. if we match 4 or more).

Safe Paths logic for matching points of concern

On receipt of the “points of concern” data from the Health Authority, Safe Paths:

Safe Paths then looks for matches on points of concern as follows:

Having performed this matching on all local data points, it then searches for notifiable events within the local data set.

To illustrate, this diagram shows a series of data points, some of which match, and some of which do not match any points of concern. The brackets on the left indicate the groups of 6 data points that match more than 66% (4 points). The brackets on the right show the overall Exposure Duration that is derived.

Normally, the data points will be ~5 minutes apart. Where data points are missing, and the gaps are more than 5 minutes apart, this does not need to affect the calculation - the same calculation can be performed, and the Exposure Duration may be calculatyed in the same way.

In some rare cases, if a data point right at the end of the Exposure Duration were to be delayed for a long time (e.g. hours), this would result in an overstatded Exposure Duration. That is a characteristic we can live with - we are actively working to reduce the frequency of this kind of occurrence (SAF-257 / SAF-278).

Related Work

In MVP, under