How much protection does hashing offer?
As of 23 May, our design for hashing, as covered here: https://pathcheck.atlassian.net/wiki/spaces/TEST/pages/81231882 and here: https://pathcheck.atlassian.net/wiki/spaces/TEST/pages/61112371 is as follows:
The App records a data point every 5 mins.
The App maps a GPS lat/long/timestamp point into a set of hashes representing “nearby” 8-char geohashes and times. The “nearby” concept addresses corner cases, and we are using a radius of 1e-05 degrees, and 2.5 mins. This results in an average of 4.6 hashes per datapoint (2 x 5-min time buckets x 2.3 8-char geohash tiles).
These hashes are encrypted using a “slow” hash, scrypt. We aim to set the cost such that it takes approx 1 second of smartphone CPU time per hash (obviously smartphone CPUs vary, so this needs to be further refined).
Data points generated by Safe Places are are hashed in the same way.
JSON data files from Safe Places are published and downloaded every 12 hours.
There is a single salt used for all hashes.
The characteristics of this are as follows:
Hashing on the App requires 4.6 x 1 x 288 = 1324 Smartphone CPU seconds per-day.
Let’s call this “1.3k SCS”.
Hashing a typical set of case data (estimated at 1,000 points of concern) costs Safe Places 1k SCS
So hashing 100 cases per day will take 100k SCS. We assume the computer doing this is more powerful than a Smartphone. Assuming a factor of 10, this might take 10k seconds of computer time = 3 hours.
An attacker who wishes to test a single geohash for inclusion in the data set, over a single day, must perform 288 hashes, taking 288 SCS, a fairly trivial calculation. This is a known weaknesses, but we don’t currently have a way to improve this.
1 sqkm contains approx 1400 8-char geohashes (exact value varies by latititude), but not by more than ~25% until you get above 50 degrees north. To attack 1km sq, over a day will take 144 x 288 approx 400k SCS. Supposing the server has a server 100x more powerful than a smartphone, that will take 4000 seconds: just over an hour.
Things get better as you expand the area and timeframe. 10km x 10km x 14 days requires 1400 x the effort, so 560M SCS, or 5.6M seconds on the same server - that’s about 2 months.
However, we need to recognize that renting compute resources is increddibly cheap these days. An AWS EC2 “a1.medium” instance costs 2.5c/hour. And you can rent multiple of these in parallel. So 2 CPU-months of compute resource can be hired for 60 x 24 x 2.5c = $36.00
Scrypt protects against the use of specialist hardware like ASICs or GPUs by attackers.
Let’s tabulate that:
Activity | Smartphone CPU seconds | Estimated elapsed time |
---|---|---|
Smartphone hashes 1 location | 4.6 | 4.6 seconds |
Smartphone CPU usage/day | 1.3k | N/A |
Safe Places publish (100 cases) | 100k SCS | 3 hours |
Safe Places CPU usage/day | 200k SCS | N/A |
Attack on single geohash, 1 day | 288 SCS | 3 seconds |
Attack on 1 sqkm, 1 day | 400k SCS | 1 hour 6 mins* (est. cost 3 cents) |
Attack on 10km x 10km, 14 days | 560M SCS | 2 months* (est. cost $36.00) |
Attack on 3000 sq km, 14 days (e.g. Lake County, FL) | 16.8B SCS | 5 years* (est. cost $1,140) |
*However, in practise an attacker could use cloud compute resources in parallel to dramatically reduce this - cost estimates are dollar costs, assuming 2.5 cents/hour for an a1.medium in AWS EC2.
Conclusion: even bast amount of compute resource are readily available from the cloud to anyone with a credit card. Ultimately, if someone wants to crack the data apart, and willing to spend some money on this, they can do so pretty easily.
Protection at the level of a small city is pretty minimal. Good protection only really comes at the state or national level.