18 May - Testing with new geohash algorithm
This report has been superseded by some superiodr abnalysis here:
May 20 - Improved analysis of impact of Geohashes & GPS Noise
I have added comments inline here on points I got wrong.
Full details to follow, but in brief, I have updated the code here: May 3 - Analysis of Exposure Detection Algorithm to reflect the new intersection algorithms that we plan, as described here: Design for MVP1 HA JSON Changes
And run a similar set of tests.
More detailed documentation to follow, but high-level results are:
Moving from 20m distance matching to 8 char geohash matching (w/ handling of edges) doesn’t have a very significant impact on false positives or false negatives.
Moving from “any notification” to “At least 66% of points in 30 mins” massively reduces false positives at 150m distance (these mostly arise due to inaccurate GPS)
Moving from a 4 hour exposure notification window to a +/-5 mins notification window also massively reduces false positives at 150m distance
In fact, either one of these measures is sufficient to more-or-less remove false positives at 150m. (we will do further testing to look at false positives at 100m, 50m etc.)
Moving to 7 char geohash tiles undoes all of this… Even with both measures in place, you’ll still get a false positive every 24 hours with 23% of people who are 150m, away from you. Given there willl be a lot of people within this sort of distance it will lead to an unfeasibly high number of false positives.
For a basic “meet for a coffee for an hour” scenario, the biggest threat to getting a match is the risk that one or other party’s phone is not logging due to GPS unreliability issues (SAF-278). Even with the old algorithm, the probability of a match was only 52%.
Reducing the exposure window from 4 hours to 5 mins reduces this to 50%
Requiring 66% of consecutive matches over 30 mins reduces this further to 32%.
If we can fix GPS unreliability (SAF-278), this will be up to 96% (the remaining 4% missing is due to inaccurate GPS (SAF-175) - which we’re also working on).
Conclusions:
GPS unreliability (SAF-278) is a much bigger issue than GPS inaccuracy (SAF-175). It causes us to miss about 70% of notifications we should be making (false negatives).
GPS inaccuracy makes a small contibution to misses (false negatives), but the main problem is that it causes false positives.
However, we have effective mitigations against false positives due to GPS inaccuracy:
(1) Requiring 66% of points match in a 30 min window.
(2) Reducing exposure notification window from 4 hours to 5 mins.
Note that using 7 char geohashes (150m x 150m) is not an effective response to GPS inaccuracy. It may help to mask it from users of the system, but it leads to very high numbers of false positives, even with measures (1) and (2) in place.