Future Enhancements to MVP1 JSON Format

This is a collection of thoughts about how we might extend the proposed MVP1 JSON spec in future

These are definitely post-MVP1, and quite possibly much later, so there is no need for detailed review of these at this point, but they are indicated to show some possible future directions as this might impact teh MVP1 design.

Changes to HA-> App JSON format

New top-level fields - Hashing Info

A further new new top-level field is added:

"hashing_details":[   
{
      "start_time":1589220000,      "cost":18,      "salt":”hkadkag65l4hat6s”
   
}
, 
{
      "start_time":1589265000,      "cost":18,      "salt":”adkslwd4hadfat6s”
   
}
]

These are the details of hashing to be used to generate hashes for comparison with entries in this file.

“start_time” in UTC seconds indicates when a given set of encryption details applies from (and this set of details will end when superseded by a later “start_time”.

“cost” indicates the power of 2 to input into the scrypt cost function. Each time this increases by 1, the cost to compute a hash approximately doubles, and so does the memory requirement (18 requiree ~262MB, and takes about 1 second on a 4 year old Windows PC)

“salt” is a 16+ character string (typically randomly generated) to use as a salt in the scrypt hashing function

Where these fields are not set, the default values (above) are used.

By convention, these details are maintained from one issue of the JSON file to the next, and changes are only made for “start_time”s that are at least 12 hours in the future from the time of publication of the JSON file. This gives clients time to doanload the file, and adjust the hashing that they do on collection of a new data point.

Hashing Info - Benefits and trade-offs.

The key benefits of the fields above is that they allow Health Authorities to:

Vary the level of privacy protection on their published data
Use unique salts
Change the salt used over time

These are useful for the following reasons.

Iterations

Increasing the number of iterations performed in a PBKDF2 hash increases the work required to compute a hash. This makes the data better protected from attackers, but it also means more work for the Safe Places publishing tool to generate the hashes in the first place, and for the Safe Paths App to generate hashes to check them.

As the volume of data presented by an HA increases, the hashign work required to publish the hashed data will increase in proportion, whil the privacy concerns aroung the data diminish - therefore it may be usefule for Health Authorities to reduce the number of hash iterations to reflect this change.

Additionally, it may be useful for a Health Authority to use a number of iterations different from the Path Check default, either because they have a higher concern for privacy, or because they want to reduce the computational costs in publishing the data, and are less concerned about privacy.

Unique Salts

For an HA to use unique salts offers their published a little more protection.

Any brute force attack on the Helath Authority’s data must incoropoare that salt, and therefore must be targeted at that specific Health Authority - rather than at the service more generally.

Given that Health Authorities typically don’t have overlapping geography, this does not make much difference (any brute force attacke must be targeted by geographic area in any case). However allowing unique salts for HAs also gves them the ability to change those salts over time.

Changing Salts

The value to a Health Authority of changing the salt that they use is that it automatically disarms any brute force attack that may have been developed against that Health Authority’s data.

If an attacker builds a database of hashes, to read the Health Authority data, it will already be time-bounded (since hashes are different for every time). But to the extent that the databased extends into the future, the Health Authority can disarm it by moving to a new hash.

The simple fact that a Health Authority has this capability should dissuade anyone from building and publishign in advance large tables of forward-looking hashes for the Health Authorities data.

We expect it will be good practice to move to a new hash every 24 hours or so, and we expect to build this into Safe Paths when the function is available.

Trade-offs

There are a couple of trade-off with this function, which explain why we aren’t implementing it for MVP1.

Some significant additional function is required in the Safe Paths App:

The ability to read this hashing info, and store suitable hashes for each Health Authority, in line with the hashing info shared with that Health Authority. This means storing multipe hashes per data point, one for each Health Authority.
Adding a new Health Authority becomes more complicated. At this point the App wil have to work through it’s entire existing location database, computing hashes that match the new Health Authority’s hashing info. That’s a computationally expensive task that will need to be run in the background over an extended period.
A similar issue occurs if the App misses a download of a JSON file, and therefore learns about a new hashing info “start_time” after that time has passed. In this case, the App has to work through the data recorded after that start time, and recompute all the hashes. If HAs publish new hashing info > 12 hours in advance, this will be a rare occurrence, but it nevertheless is one we should handle in the App.

This complexity in the App explains why this is not in plan for MVP1. We will look at scheduling this at some point in the future, when it is clear that it is required,

New points of concern fields - Criticality & Exposure Duration

As detailed in this article, it may be useful to HAs in future to allow “criticality” and “time window” values to be associated with individual points of concern.

What do Health Authorities really want from Exposure Notifications?

We could extend the syntax for “concern points” like this:

"concern_points":[   
{
      "hash":“87e916850d4def3c”,      "criticality":3,      "time_window":60  
}
]

“criticality” is used to indicate a very high risk data point. A match with this data point should count as it it were multiple matches, as defined in criticality, thereby triggering an exposure notification even in the absence of other notifications in the surrounding area.

Careful design thought needed as to exactly how this will interact with “notification_threshold_percent” and “notification_threshold_count” in a variety of scenarios.

Negative criticality might be useful in some cases, to indicate that if someone was observed at a particular point, they are probably not at risk - see the “cinema” case in the Exposure Notifications article. Again more thought & design work needed before we implement this.

“time_window” indicates the length of time (in minutes) after this data point, that should be considered for exposure purposes.

Note, given the current hashing design, this will be computationally expensive, as hashes will need to be computed for every 5 minute period covered, each hash taking 1-4 CPU seconds). Careful consideration is needed as to whether this computational cost will be acceptable. We could only reduce the cost by weakening the cost of our slow hash, which would have other consequences.

Changes to App-> HA JSON format

As well as changing the JSON data for Exposure Notifications, we are will also change the JSON data exposed by the App when the user chooses to share their location with the HA. This will allow Safe Places to use hashes computed by the App, rather than having to compute all hashes itself.

Currently the data consists of a series of points like this.

{"longitude": 14.91328448, 
 "latitude": 41.24060321,  
 "time": 1589117739000
}

SIngle Hash

If parameters (cost & salt) are not controllable by HAs, we can just add a single has value like this:

{"longitude": 14.91328448, 
 "latitude": 41.24060321,  
 "time": 1589117739000,
 "hash": "87e916850d4def3c"
}

“hash”: the scrypt-generated 64-bit Hash value of the location + time data point, as described above (See: “Hash Calculation”).

Multiple Hashes

In future, if we implement function to enable HAs to set their own costs & salt (see above) we could to expand the format to allow multiple sets of hash data, and therefore to have a format like this:

For MVP1, each point is expanded to include a hashes (there could be zero, one or more of these).

{"longitude": 14.91328448, 
 "latitude": 41.24060321,  
 "time": 1589117739000,
 "hash_data": [
    {"hash":"87e916850d4def3c",
     "cost":18,
     "salt":"LetsDefeatCovid-19"} 
 ]
}

The new fields would be as follows;

“hash_data”: an array of hashes of the data point, with details as below. In MVP1, there is expected to be exactly one set of points. In future, the App may receive different “cost” and “salt” parameters from different Health Authorities, and therefore may compute multiple hashes for each data point. When it does this, it will include a set of hash data for each hash that it has computed.

“cost” the cost of the Hash - in MVP1 this will always be a fixed value. Exact value TBC but currently expected to be 18 - in future we expect some Health Authorities to set this to other values,

“salt” the salt used to generate the Hash. In MVP1 this will always be: “LetsDefeatCovid-19” - in future we expect some Health Authorities to set this to other values,

There is no good reason to implement this additional change to the format in advance of the function to allow HAs to set their own hashes and salts (which may never be implemented).

When we do implement that function, it will be straightforward to support back compatibility on Safe Places simply by interpreting a single “hash” field as being equivalent to a single row “hash_data” table, with default “cost” & “salt”.