Current Status

Current PoC Hashing implementation uses:

Performance data points:

On Safe Places:

Plan of Record

Plan of record was to stick with 8 char geohashes, and increase the cost to ~16 or 17, and to do so in MVP1.

Our goal was ~1 second per hash on a “typical” smartphone, which suggests we want a 10-fold increase.

Hence cost of 2^15 or 2^16. (8x or 16x more cost).

That would increase the cost of 100k hashes on Safe Places to 8k seconds or 16k seconds (2 to 4 hours approx). That’s still on Jeff’s laptop. Let’s assume the Google servers are similar (I’m seeking input from Sherif on this(

This is in the same ballpark as the 3 hours projected here: How much protection does hashing offer?

However, discussion with the Safe Places Dev team has revealed that while from a requirements poiint of view there may be no issue with a 3 hour publish step, this creates significant issues from an implementation point of view.

Probably we’d want to split out a new separate service to do the hashing, and we’d need to consider what happens when a second request to hash data comes in when the 1st is still being processed etc.

For these reasons (and because we are not yet very concerned about security for the early MVP pilots, we are leaving the cost at 4096 = 2^12 for now, and tackling increasing the cost as a post-MVP1 activity.

Options for post-MVP1

We could move ahead with a cost increase.

Another alternative would be to move to 9 character geohashes.

We ruled this out previously on the basis that

Looking at this again, though the argument is not so clear-cut

(note that reducing the match radius from 100 ft to 60 ft reduces the match area nearly 3-fold, so the benefit is substantial in terms of avoidance of false positives)

What are the downsides?

Looking at the numbers in detail:

Degrees north

Number of geohashes in 20m radius circle

Increase in cost vs. current average of 2.3*

0

58

25x

30

66

28x

45

78

34x

60

107

46x

*Note that this average of 2.3 actually varies by latitude as well, although not by as much.

Assuming we stick with the ~5 seconds allowance for a smartphone to compute all hashes (including 2 timestamps), the we can afford approx 100msecs per hash.

To achieve this on lower end phones, we probably need to reduce the hash cost to 2048 = 2^11.

However, given the 30x entropy increase in the location space, this delivers equivalent protection to a hash cost of 65536 = 2^16 with 8 char geohashes.

As per previous anlaysis, this is roughly net neutral in terms of encryption costs for the app and the attacker. But it massively reduces Safe Places encryption cists, and gives much more accurate proximity detection.

Looking Ahead - FHE

Long-term, we anticipate moving from Scrypt encryption to an FHE based solution.

The FHE solution will no longer depend on the entropy of the location space, so the finer resolution does not deliver much benefit (except perhaps against some hypothetical typs of brute force attackes using rooted phones).

However the increased number of data points to be checked for matches will create issues.

The volume of computations that need to be performed by “Server 1” in the FHE design will increase by a factor of ~30 (as per the above), as it will have to check 30x as many data points.

This may be an issue for the FHE design. However, the alternative is that we stick with 8 character geohashesh indefinitely, and cannot deliver more accurate expsure notifications than the 100 feet that we are delivering in MVP1.

It seems to me that we should make the move to 9 character geohashes, and make it a requirement on the FHE initiative that it cope with this. However I’d value input on this from those who will be developing the FHE code.

The future - Public Data

From our engagement with the Open Security Summit, we have been getting some strong encouragement to drop the idea of Points of Concern being private, and embrace the idea that they are public data.

The principal argument is as follows:

Should we move to such a model for our “points of concern” data?

There could be some significant benefits of this data being public:

However we believe that at least some of the data that is acceptable to share with individual users who may be infected, it could be problematic if it were made completely public.

Probably, it is the wrong question to ask: should “points of concern” data be public, and better to ask:

Which “points of concern” data should be public, while continuing to develop technology that allows us to share points of concern with individual infected users, without it being public.

We can, in parallel, continue to develop such technology, while also exploring procedures by which Health Departments could determine that particular data points could be made completely public, and then developing technology to allow this sub0category of points of concern to be exposed in a plain-text, sharable form.

Proposal for post-MVP1

I propose that post-MVP1, we do not increase the cost of the Scrypt hash (as previously planned).

Instead, we move to 9 character geohashes, with an appropriate Scrypt cost (probably 204 = 2^11) with the key benefits being:

I also propose that: