howl/tidy: TIDY: Time-based IDs. the y is up to You.

howl/tidy

TIDY: Time-based IDs. the y is up to You.

Go 100%

Find a file

Morgan Bazalgette bf1980aa68 change import path to howl.moe/tidy		2018-08-21 18:36:25 +02:00
.drone.yml	change import path to howl.moe/tidy	2018-08-21 18:36:25 +02:00
id.go	change import path to howl.moe/tidy	2018-08-21 18:36:25 +02:00
id_example_test.go	change import path to howl.moe/tidy	2018-08-21 18:36:25 +02:00
id_test.go	initial commit	2018-08-21 18:26:11 +02:00
LICENSE	initial commit	2018-08-21 18:26:11 +02:00
README	change import path to howl.moe/tidy	2018-08-21 18:36:25 +02:00

README

import "howl.moe/tidy"

Tidy
====

Documentation: https://godoc.org/howl.moe/tidy

A Snowflake-like 64-bit ID, with a constant 11-byte string representation usable
in URLs and unique IDs for another 9000 years.

Why
---

There are many reasons why time-based IDs are better than sequential IDs;
starting from the fact that it does not need to store a persistent counter
of IDs somewhere, to security factors (a third party can't find an admin
account by simply looking for user ID 1). There's also the fact that you don't
need to store the creation date of a resource.

Twitter solved this problem by creating Snowflake IDs:
https://github.com/twitter/snowflake/tree/b3f6a3c6ca8e1b6847baa6ff42bf72201e2c2231

These IDs, however, have problems, chief among which the fact that the IDs roll
over after 69 years. While I don't realistically expect my software to still be
used in 69 years (I definitely don't use software from 1949), I still want to
guard against this.

How it works
------------

IDs are 64-bit unsigned integers. The lower 15 bits are used for the step
number, a number which is increased atomically each time New() is called. The
higher 59 bits represent the timestamp, which is the timestamp in nanoseconds
divided by 2^19. This allows for a roughly 0.5 millisecond precision (2^19 =
524 288 ns, 1 millisecond = 1 000 000 ns). This essentially translates to
2^15 IDs being generated every unit (~0.5ms) without the possibility of a
conflict - in other words, there needs to be an ID generated every 16
nanoseconds in order for this to clash. This is often not even possible because
on modern hardware, even running in parallel, New() takes 40 ns to run.

Performance
-----------

On my Thinkpad X220 with an Intel i5:

BenchmarkNew-4 20000000 63.0 ns/op 0 B/op 0 allocs/op
BenchmarkNewParallel-4 30000000 39.8 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryMarshalID-4 2000000000 0.96 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryUnmarshalID-4 2000000000 0.96 ns/op 0 B/op 0 allocs/op
BenchmarkTextMarshalID-4 20000000 64.9 ns/op 16 B/op 1 allocs/op
BenchmarkTextUnmarshalID-4 30000000 43.5 ns/op 0 B/op 0 allocs/op

Working with binary data is incredibly cheap, mostly because the functins are
really just doing a binary encoding of a number, and these can even be inlined.
The text representation is a bit more expensive, and will even require an
allocation in the case of text marshalling, but all of these finish within 100
ns, so performance is not an issue.

Generation of IDs is mostly "expensive" because it requires a syscall (to get the
current time).

History
-------

This is boilerplate code I found myself using in a variety of projects. I always
refrained from making it a package of its own because it's only a few hundred
lines of code, and "a little copying is better than a little dependency", but
this is so useful that it deserves its own package.

It was originally born in a project that ended up in my graveyard of side
projects, Constellate. Existing solutions either required 128-bit IDs
(far too big) or had problems (like Snowflake).

License
-------