A presentation on how microservices were implemented at the Norwegian tax authority. This presentation displays concepts and shows a few implementation details of a solution for the JVM.
9. Who is paying taxes in Norway?
passport
foreigner id
citizen id
international
locally registered
own
employ
ownrelate
10. “Partsregister”: tracking taxable entities in Norway
folkeregister
enhetsregister
Toll
Skatteetaten
event store
id (part)
management
searchdetails
relationshipsexports
legacy register
This is a schematic view only.
11. The promise of event-sourcing and our experience
• Event-sourcing allows you to easily change snapshot representation
• unless you did not sufficiently future-proof event capture
• Event-sourcing makes snapshots redundant by replaying events
unless the event-processing code changes
• Event-sourcing implies full auditability of your application
• unless an error happens during command-to-event processing
• Event-sourcing offers an easy way of debugging applications
• unless events are trivial compared to command input
• Event-sourcing is an easy gateway to share-nothing architecture
• but only if you could shard your data in the first place
Disclaimer: our approach could be described as a combination of event sourcing
and “command sourcing” with limited capability to scaling writes. But for us, this
solution works great!
13. 1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
folkeregister
Persist events for mistakes that need explicit correction
The presented file formats are simplified for didactical reasons.
event store
{
"fnr": "11059955214",
"pnr": "9950174"
}
14. {
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
Event-dependent state and sequence numbers
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
/part/9950174/1 /part/9950174
event store
11059955214
Some Man
Oslo
17028421937
Some Woman
Drammen
11059955214
Some Man
Drammen
sequence: 1 sequence: 2 sequence: 3
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
/part/9950174/3/part/9950174/2
15. /rel/7573509
Using sequence numbers for dealing with eventual consistency
X-Sequence: 2
{
"owner": "9950174"
}
/part/9950174/2
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
event store
last-event: 3 last-event: 2
16. /rel/7573509
Using sequence numbers for dealing with eventual consistency
X-Sequence: 3
{
"owner": "9950174"
}
/part/9950174/3
BAD REQUEST:
{
"sequence": "2"
}
event store
last-event: 2 last-event: 3
17. 9950174
sequence 3
9950174
sequence 1
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "grammen"
}
event store
11059955214
Some Man
Oslo
17028421937
Some Woman
Drammen
11059955214
Some Man
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
Publishing thin change feeds to expose application state
294851
sequence 2
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
9950174
sequence 3
20. Using the event store as a single source of truth
event store
read commands read events
recover/replicate
events
write events
Advantages of "command sourcing":
1. Self-healing state after any bug fix without any user management.
2. Only command-to-event mapping is domain-specific code.
3. Minimal probability to misinterpret events after updates.
Downside: command-to-event processing must be stateless to allow reprocessing.
Revision-sensitive event observers can often remedy this limitation.
21. Event UIDs for idempotency of write operations
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
event store
Fnr:
11059955214
Event id:
18
Name:
Some Man
City:
Oslo
Fnr:
17028421937
Event id:
49
Name:
Some Woman
City:
Drammen
Fnr:
1105995521
Event id:
94
City:
Drammen
folkeregister
fr:fileABC:1 fr:fileABC:2 fr:fileABC:3
Unique keys can also be chosen as UUIDs for live commands.
22. /part/749572
{
"name": "Some Company"
}
{
"name": "Some Company",
"last_id": "gf01Ha"
}
{
"name": "Some Company",
"last_id": "df57Ha"
}
Part:
749572
Name:
Other Name
Part:
749572
Name:
Other Name
Last id:
gf01Ha
Part:
749572
Name:
Yet Another Name
Part:
749572
Name:
Yet Another name
Last id:
gf01Ha
Using event UIDs as optimistic locks
event store
46sjGF
df57fF
/part/749572
Part:
749572
Name:
Other Name
Last id:
gf01Ha
Part:
749572
Name:
Yet Another name
Last id:
gf01Ha
df57fF
/part/749572/df57fF
/part/749572/46sjGF
Event UIDs are non-numeric to avoid confusion with sequence numbers.
23. Deleting events and compaction events
Why would you want to delete events?
1. Because you want.
Storage space is not free after all.
2. Because you should.
Storing obsolete personal data makes you a target for attackers and is immoral.
3. Because you have to.
Laws like the GDPR demand physical erasure.
24. Deleting events with tombstones
event store
17028421937
Some Woman
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
11059955214
[tombstone]
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
Tombstones must not be deleted themselves to allow for propagation to all services.
For this reason, it is crucial to choose primary identificators that do not contain personal data (unlike a fødselsnummer).
Ideally, an internal, synthetic identificator is used as a proxy for each personal identificator.
25. Compacting events with compaction events
17028421937
Some Woman
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
11059955214
Some Man
Drammen
[compaction: 3]
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo",
"compacted": "3"
}
/part/9950174/1
Can be represented by same database entity.
event store
27. What is out there?
API-wapper for MongoDB.
Originates from the .NET space.
Java client but Scala-oriented.
Java framework for CQRS.
Strict command and event seperation.
Support for JDBC-integration.
Append-only database.
Only recently published.
DIY at Skatteetaten. Reasons for choice:
1. Performance.
Streaming has a high overhead for mass processing.
Need for microbatching to allow for microservice orchestration.
2. Complexity
Event sourcing is not yet mainstream. APIs feel often immature.
Event stores often aim for distributability at the cost of simplicity.
3. Loose command-to-aggregate mapping
Many frameworks assume that there exists an obvious mapping
of any command to an aggregate.
28. class Event {
long sequence; // 0 if not set
String uid;
String id;
String type; // XML namespace id
String value; // XML
}
Events and event stores
interface EventStore {
Stream<Event> read(long afterSequence);
ClosableConsumer<Event> write();
}
EventStore source, target;
try (Stream<Event> stream = source.read(0);
ClosableConsumer<Event> consumer = target.write()) {
stream.forEach(consumer);
}
29. class SQLEventStore implements EventStore
class InMemoryEventStore implements EventStore
class HttpEventStore implements EventStore
Events and event stores
LOCK TABLE events;
INSERT INTO events (sequence, uid, id, type, value)
SELECT seq.NEXTVAL, ?, ?, ?, ?
FROM dual
WHERE ? NOT IN (SELECT uid FROM events)
SELECT *
FROM events
WHERE seq > 0
FETCH FIRST 1000 ROWS ONLY
SELECT /*+ index(events seq) */ *
FROM events
WHERE seq > 0
FETCH FIRST 1000 ROWS ONLY
31. class SQLAggregateStore implements WriteableAggregateStore
class InMemoryAggregateStore implements WritableAggregateStore
class HttpAggregateStore implements AggregateStore
Aggregates and aggregate stores
SELECT s.id, s.value
FROM aggregates s
INNER JOIN (
SELECT MAX(sequence) ms, id
FROM aggregates
WHERE sequence <= ?
GROUP BY id
) t
ON s.id = t.id
AND s.sequence = t.ms
WHERE id = ?
INSERT INTO aggregates (sequencee, id, valuee)
VALUES (?, ?, ?)
34. /events/1380/events/0/events/1000
Polling or pushing events
event store
event store
/socket/0
require 1000
event store
/events/1320
/udp/broadcast
Interval-polling:
- Adds network overhead
- Interval adds latency
- Serves as a heartbeat
- Simple and works well with few consumers
Websockets:
- Allows for reactive programming
- Fast processing can break micro-batching
- Optimizes for low-latency
- Slow in instable networks
Broadcast-triggered polling:
- Avoids long-lasting connections
- Scales better with consumer count
- Adds latency in instable networks
- Broadcast can be delayed under high load
35. Things to mention
1. We do not process single events.
Instead of real streaming, we apply "micro-batching".
Without, HTTP calls between microservices would hang up our system.
2. We do not use transactions.
In case of an error, we simply reset an aggregate store the last known sequence id.
This also alows us to use multiple databases such as Oracle/Elasticsearch without XA.
3. We have cut some corners.
To save time and money, not everything presented is implemented at Skatteetaten.
4. Asynchronicity and eventual-consistency are optional concepts.
By processing messages as they arrive, it is possible to implement an event-sourced
application without eventual consistent state.
43. Things to mention
1. Full partitioning (sharing) conflicts with a total store order of all events.
Message log systems such as Kafka use partitions to achieve performance.
This might hinder future services that want to aggregate events of different partitions.
2. Beware of time-based sequencing.
Databases like MongoDB generate ordering ids based on the system clock.
Timers are not fully reliable, even when using NTP.
3. We split reader responsibility into aggregator and API for blue/green deployment.
As we do not require full versioning of all components, a parallel deployment of an
application allows to recreate a "fixed" version that can replace an older version.
4. Operating microservices requires a significant amount of resources.
HTTP and (un-)marshalling are expensive operations. While enabling scalability,
distributed architecture requires a baseline of additional resources to match the
level of centralized applications.