How much protection does hashing offer?

As of 23 May, our design for hashing, as covered here: https://pathcheck.atlassian.net/wiki/spaces/TEST/pages/81231882 and here: https://pathcheck.atlassian.net/wiki/spaces/TEST/pages/61112371 is as follows:

 

  • The App records a data point every 5 mins.

  • The App maps a GPS lat/long/timestamp point into a set of hashes representing “nearby” 8-char geohashes and times. The “nearby” concept addresses corner cases, and we are using a radius of 1e-05 degrees, and 2.5 mins. This results in an average of 4.6 hashes per datapoint (2 x 5-min time buckets x 2.3 8-char geohash tiles).

  • These hashes are encrypted using a “slow” hash, scrypt. We aim to set the cost such that it takes approx 1 second of smartphone CPU time per hash (obviously smartphone CPUs vary, so this needs to be further refined).

  • Data points generated by Safe Places are are hashed in the same way.

  • JSON data files from Safe Places are published and downloaded every 12 hours.

  • There is a single salt used for all hashes.

The characteristics of this are as follows:

  • Hashing on the App requires 4.6 x 1 x 288 = 1324 Smartphone CPU seconds per-day.

  • Let’s call this “1.3k SCS”.

  • Hashing a typical set of case data (estimated at 1,000 points of concern) costs Safe Places 1k SCS

  • So hashing 100 cases per day will take 100k SCS. We assume the computer doing this is more powerful than a Smartphone. Assuming a factor of 10, this might take 10k seconds of computer time = 3 hours.

  • An attacker who wishes to test a single geohash for inclusion in the data set, over a single day, must perform 288 hashes, taking 288 SCS, a fairly trivial calculation. This is a known weaknesses, but we don’t currently have a way to improve this.

  • 1 sqkm contains approx 1400 8-char geohashes (exact value varies by latititude), but not by more than ~25% until you get above 50 degrees north. To attack 1km sq, over a day will take 144 x 288 approx 400k SCS. Supposing the server has a server 100x more powerful than a smartphone, that will take 4000 seconds: just over an hour.

  • Things get better as you expand the area and timeframe. 10km x 10km x 14 days requires 1400 x the effort, so 560M SCS, or 5.6M seconds on the same server - that’s about 2 months.

  • However, we need to recognize that renting compute resources is increddibly cheap these days. An AWS EC2 “a1.medium” instance costs 2.5c/hour. And you can rent multiple of these in parallel. So 2 CPU-months of compute resource can be hired for 60 x 24 x 2.5c = $36.00

  • Scrypt protects against the use of specialist hardware like ASICs or GPUs by attackers.

Let’s tabulate that:

Activity

Smartphone CPU seconds

Estimated elapsed time

Activity

Smartphone CPU seconds

Estimated elapsed time

Smartphone hashes 1 location

4.6

4.6 seconds

Smartphone CPU usage/day

1.3k

N/A

Safe Places publish (100 cases)

100k SCS

3 hours

Safe Places CPU usage/day

200k SCS

N/A

Attack on single geohash, 1 day

288 SCS

3 seconds

Attack on 1 sqkm, 1 day

400k SCS

1 hour 6 mins* (est. cost 3 cents)

Attack on 10km x 10km, 14 days

560M SCS

2 months* (est. cost $36.00)

Attack on 3000 sq km, 14 days (e.g. Lake County, FL)

16.8B SCS

5 years* (est. cost $1,140)

*However, in practise an attacker could use cloud compute resources in parallel to dramatically reduce this - cost estimates are dollar costs, assuming 2.5 cents/hour for an a1.medium in AWS EC2.

 

Conclusion: even bast amount of compute resource are readily available from the cloud to anyone with a credit card. Ultimately, if someone wants to crack the data apart, and willing to spend some money on this, they can do so pretty easily.

Protection at the level of a small city is pretty minimal. Good protection only really comes at the state or national level.