Moving Safe Paths & Safe Places to use geohashes natively

After some discussion we decided to not to move to “native” geohashes. The products are still natively using lat/long, and only converting to geohashes for location matching.

For MVP1 we decided this was an unnecessary level of change, and we aren’t yet confident that geohashes will be what we use long term.

I recently laid out a proposed set of changes to our JSON formats client->HA and HA->client for security.
Design for MVP1 HA JSON Changes

 

While I introduced hashed geohashes, I left the non-hashed location data in lat/long format.

I’m starting to think we should make a shift to use geohashes natively everywhere, i.e.

  • Only store geohashes in Safe Paths

  • Data export to Safe Places contains geohashes, not lat/long

  • Safe Places data points represented & stored as geohashes

Benefits of geohashes:

  • Contains implicit accuracy info (based on number of characters / sepcifiicity)

  • Does not contain spuriously accurate data - publishing geohashes may increase confidence in our data - (1) stationary user won’t see signal jumping around by a few metres here & there, but will see a consistent position reported. (2) it’s creates the appearance that we know the limitations of our data accuracy & have engineered for this.

  • Super-easy to blur (just drop the last character)

  • More compact than lat/long

  • Just as easy to look up - e.g. https://www.movable-type.co.uk/scripts/geohash.html

  • Easy to compare (just literally compare the hashes).

Downside:

  • Less precise, especially when recorded data point was right at the corner of a geohash. Probable workaround for this is to store a handful of additional “nearby” geohashes, reflecting other geohashes that fell within 20m of the original GPS data point.

 

Key decisions to take:

  • Do we want to make this change? Seems like a no-brainer for Safe Paths, Less clear for Safe Places - how would it impact what we expose in GUIs? That may be a significant change to take on…

  • If we do, what accuracy should we use…?

 

On accuracy, at the equator geo-hash accuracies are as follows:

  • 7 char: width 150m x height 150m

  • 8 char: width 38m x height 19m

  • 9 char: width 5m x height 5m

Widths narrow as you move away from the equator.

 

Here’s a 7 char geohash (a location in central Boston)

And here’s an 8 char geohash in the same location. As you can see, an 8 char geohash roughly corrsponds to s amll busilding, business or restaurant. a 7 char geohash in an urban area covers dozens of similar businesses (not surprising as it is 32x the size).

And here’s a 9 character geo-hash - a 5m x 5m square.

I think that 7 char geohashes will cause big problems with false positives, even with the changes we are introducing to require sustained matches over an extended period.

If the Contact Tracers want to find anyone who spent 30 mins at the Boston Burger Company, they can’t do so without also dragging in people who sepnt 30 mins at a dozen other businesses: Cafe Sushi, Karma Gym etc.

If we presented 7 char geohashes to contact tracers in the Safe Places UI, I think we’d get immediate feedback that this is not viable.

If we instead present data points (or smaller geohashes in Safe Places, but match under the covers on the basis of a 7 char geohash, then we’re simply maskign the problem - we may get less negative immediate feedback from Health Authorities, but we’ll get the same number of false positives.

8 char geo hashes, on the other hand, look ideal for pickign out an individual location such as a restaurant. I imagine that Contact Tracers using Safe Places would be happy defining points of concern in terms of 8 char geohashes.

What’s the case for 7 char geohashes?

Our current plan of record is to use 7 char geohashes. This was motivated by 2 concerns:

  • Privacy, where we believed using blurred location points would be privacy-protective.

  • GPS inaccuracy, where we felt that using blurred locaton points would mask our problems to some extent.

Further analysis has shown that 7 char geohashes actually makes privacy worse, not better, as it removes entropy, and makes the “points of concern” data 32x easier to crack open.

GPS inaccuracy continues to be a major concern, but it is not clear to me that we’ll be able to adequately mask the issues we have here:

  • As per above, even with 30 mins overlap required for exposure notification, a 7 char geohash will throw up 100s of false positives across a dozen or more local businesses, which are nothing to do with the point of concern.

  • We can either choose to expose the scale of a 7 char geohash in Safe Places, or try to hide this, but the problems with false positives will be the same.

As a point of reference, here’s an article that ridicules the North Dakota contact tracing app for having +/-65m inaccuray on location points, rendering it ineffective. Interestingly the reason behind that is that they did not want to use GPS for location, and so relied on WiFi & cell towers.

https://mashable.com/article/north-dakota-contact-tracing-app/?europe=true

Proposal

Pre-requisite - GPS accuracy

  • A pre-requisite for all of this is that we need to sort out GPS accuracy to be within ~20m most of the time.

  • Based on considerations above regarding inevitable false positives from 7 char geohashes, and the reportign on teh North Dakota app, I think this is mandatory for any GPS-based exposure notification solution.

 

Part 1 - I think we should do this: Safe Paths + JSON interfaces → native geohash.

  • Move Safe Paths to natively store location as 8 char geohash, rather than GPS points.

  • As well as “main location” geohash, also store a set of “nearby” geohashes, reflecting all geohashes that fall within 20m of the original recorded lat/long position.

  • On JSON interface App->Safe Places, send the “main location” geohash instead of lat/long.

  • On JSON interface Safe Places->App, flag points of concern with geohashes, not lat/long (in time this will move to hashes of geohashes). These should be 8 char geohashes.

  • Chang matching logic in Safe Paths to be geohash based, rather than lat/long based.

 

Part 2- I think we should consider this, but would love Kyle’s input: Safe Places → Native geohash

  • Safe Places stores all locations as 8 char geohashes

  • Safe Places UI adapted to display geohash areas, rather than individual points of concern.

  • When Safe Places users edit or modify points of concern, all they can do is replace one 8 char geohash with another, they don’t have any more fine control than this.