SlideShare une entreprise Scribd logo
1  sur  44
Event-Sourcing
Microservices
on the JVM at the
Norwegian Tax Authority
Concept
Implementation
Operation
0
Accountcredit 100
credit 50
debit 200
credit 150
2018/05/02 14:30
credited 100 to account
2018/05/02 18:15
credited 50 to account
2018/05/05 10:00
debitted 200 from account
2018/05/06 12:00
credited 150 to account
100150-50100
Logging events
0
Accountcredit 100
credit 50
debit 200
credit 150
100150-50100 0
Audit
100
150
-50
State auditing
0
Snapshot
100150-50100
credit 100
credit 50
debit 200
credit 150
credited 100
Events
credited 50
debited 200
credited 150
Event sourcing
0
Snapshot
100150-50100
credit 100
credit 50
debit 200
credit 150
credited 100
Events
credited 50
debited 200
credited 150
Event sourcing: resetting snapshots
0
Snapshot
100150300
debit 200
credit 150
credited 100
Events
credited 50
overrun
credited 150
credit 100
credit 50
Event sourcing: events and commands
Command query [responsibility] segregation (CQ[R]S)
commands
do not query state
queries
do not change state
Who is paying taxes in Norway?
passport
foreigner id
citizen id
international
locally registered
own
employ
ownrelate
“Partsregister”: tracking taxable entities in Norway
folkeregister
enhetsregister
Toll
Skatteetaten
event store
id (part)
management
searchdetails
relationshipsexports
legacy register
This is a schematic view only.
The promise of event-sourcing and our experience
• Event-sourcing allows you to easily change snapshot representation
• unless you did not sufficiently future-proof event capture
• Event-sourcing makes snapshots redundant by replaying events
unless the event-processing code changes
• Event-sourcing implies full auditability of your application
• unless an error happens during command-to-event processing
• Event-sourcing offers an easy way of debugging applications
• unless events are trivial compared to command input
• Event-sourcing is an easy gateway to share-nothing architecture
• but only if you could shard your data in the first place
Disclaimer: our approach could be described as a combination of event sourcing
and “command sourcing” with limited capability to scaling writes. But for us, this
solution works great!
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
folkeregister
Why “command sourcing”?
The presented file formats are simplified for didactical reasons.
event store
{
"fnr": "11059955214",
"name": "Some Man"
}
{
"fnr": "11059955214",
"name": "gome Man"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
folkeregister
Persist events for mistakes that need explicit correction
The presented file formats are simplified for didactical reasons.
event store
{
"fnr": "11059955214",
"pnr": "9950174"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
Event-dependent state and sequence numbers
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
/part/9950174/1 /part/9950174
event store
11059955214
Some Man
Oslo
17028421937
Some Woman
Drammen
11059955214
Some Man
Drammen
sequence: 1 sequence: 2 sequence: 3
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
/part/9950174/3/part/9950174/2
/rel/7573509
Using sequence numbers for dealing with eventual consistency
X-Sequence: 2
{
"owner": "9950174"
}
/part/9950174/2
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
event store
last-event: 3 last-event: 2
/rel/7573509
Using sequence numbers for dealing with eventual consistency
X-Sequence: 3
{
"owner": "9950174"
}
/part/9950174/3
BAD REQUEST:
{
"sequence": "2"
}
event store
last-event: 2 last-event: 3
9950174
sequence 3
9950174
sequence 1
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "grammen"
}
event store
11059955214
Some Man
Oslo
17028421937
Some Woman
Drammen
11059955214
Some Man
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
Publishing thin change feeds to expose application state
294851
sequence 2
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
9950174
sequence 3
Revisioning aggregates for idempotency
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "grammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
diff diff diff
17028421937
Some Woman
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
reprocess
/part/9950174/3/1
/part/9950174/3[/2]
/part/294851/2/1/part/9950174/1/1
event store
9950174
sequence 3
9950174
sequence 1
revision 1
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "grammen"
}
Publishing revisions in a feed
294851
sequence 2
revision 1
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1/1 /part/294851/2/1 /part/9950174/3/1
9950174
sequence 3
revision 1
9950174
sequence 3
revision 2
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
/part/9950174/3/2
Using the event store as a single source of truth
event store
read commands read events
recover/replicate
events
write events
Advantages of "command sourcing":
1. Self-healing state after any bug fix without any user management.
2. Only command-to-event mapping is domain-specific code.
3. Minimal probability to misinterpret events after updates.
Downside: command-to-event processing must be stateless to allow reprocessing.
Revision-sensitive event observers can often remedy this limitation.
Event UIDs for idempotency of write operations
1105995521418Some Man 1047100000Oslo 1503XXXXX185719...
1702842193749Some Woman 9755384654Drammen 9456 A529184...
1105995521494 00000Drammen 0000XXXXX000000...
event store
Fnr:
11059955214
Event id:
18
Name:
Some Man
City:
Oslo
Fnr:
17028421937
Event id:
49
Name:
Some Woman
City:
Drammen
Fnr:
1105995521
Event id:
94
City:
Drammen
folkeregister
fr:fileABC:1 fr:fileABC:2 fr:fileABC:3
Unique keys can also be chosen as UUIDs for live commands.
/part/749572
{
"name": "Some Company"
}
{
"name": "Some Company",
"last_id": "gf01Ha"
}
{
"name": "Some Company",
"last_id": "df57Ha"
}
Part:
749572
Name:
Other Name
Part:
749572
Name:
Other Name
Last id:
gf01Ha
Part:
749572
Name:
Yet Another Name
Part:
749572
Name:
Yet Another name
Last id:
gf01Ha
Using event UIDs as optimistic locks
event store
46sjGF
df57fF
/part/749572
Part:
749572
Name:
Other Name
Last id:
gf01Ha
Part:
749572
Name:
Yet Another name
Last id:
gf01Ha
df57fF
/part/749572/df57fF
/part/749572/46sjGF
Event UIDs are non-numeric to avoid confusion with sequence numbers.
Deleting events and compaction events
Why would you want to delete events?
1. Because you want.
Storage space is not free after all.
2. Because you should.
Storing obsolete personal data makes you a target for attackers and is immoral.
3. Because you have to.
Laws like the GDPR demand physical erasure.
Deleting events with tombstones
event store
17028421937
Some Woman
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
11059955214
[tombstone]
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
Tombstones must not be deleted themselves to allow for propagation to all services.
For this reason, it is crucial to choose primary identificators that do not contain personal data (unlike a fødselsnummer).
Ideally, an internal, synthetic identificator is used as a proxy for each personal identificator.
Compacting events with compaction events
17028421937
Some Woman
Drammen
11059955214
Some Man
Oslo
11059955214
Some Man
Drammen
11059955214
Some Man
Drammen
[compaction: 3]
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Drammen"
}
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo"
}
{
"fnr": "17028421937",
"name": "Some Woman",
"city": "Drammen"
}
/part/9950174/1 /part/294851/2 /part/9950174/3
{
"fnr": "11059955214",
"name": "Some Man",
"city": "Oslo",
"compacted": "3"
}
/part/9950174/1
Can be represented by same database entity.
event store
Concept
Implementation
Operation
What is out there?
API-wapper for MongoDB.
Originates from the .NET space.
Java client but Scala-oriented.
Java framework for CQRS.
Strict command and event seperation.
Support for JDBC-integration.
Append-only database.
Only recently published.
DIY at Skatteetaten. Reasons for choice:
1. Performance.
Streaming has a high overhead for mass processing.
Need for microbatching to allow for microservice orchestration.
2. Complexity
Event sourcing is not yet mainstream. APIs feel often immature.
Event stores often aim for distributability at the cost of simplicity.
3. Loose command-to-aggregate mapping
Many frameworks assume that there exists an obvious mapping
of any command to an aggregate.
class Event {
long sequence; // 0 if not set
String uid;
String id;
String type; // XML namespace id
String value; // XML
}
Events and event stores
interface EventStore {
Stream<Event> read(long afterSequence);
ClosableConsumer<Event> write();
}
EventStore source, target;
try (Stream<Event> stream = source.read(0);
ClosableConsumer<Event> consumer = target.write()) {
stream.forEach(consumer);
}
class SQLEventStore implements EventStore
class InMemoryEventStore implements EventStore
class HttpEventStore implements EventStore
Events and event stores
LOCK TABLE events;
INSERT INTO events (sequence, uid, id, type, value)
SELECT seq.NEXTVAL, ?, ?, ?, ?
FROM dual
WHERE ? NOT IN (SELECT uid FROM events)
SELECT *
FROM events
WHERE seq > 0
FETCH FIRST 1000 ROWS ONLY
SELECT /*+ index(events seq) */ *
FROM events
WHERE seq > 0
FETCH FIRST 1000 ROWS ONLY
interface AggregateStore {
Optional<String> read(String id, long sequence);
}
interface WriteableAggregateStore extends AggregateStore {
void write(String id, long sequence, String aggregate);
}
Aggregates and aggregate stores
EventStore source;
AggregateStore target;
try (Stream<Event> stream = source.read(0)) {
stream.forEach(event -> {
String aggregate = target.read(event.id, event.sequence)
.map(aggregate -> Domain.updateAggregate(aggregate, event.value))
.orElse(() -> Domain.newAggregate(event.value));
target.write(event.id, event.sequence, aggregate);
});
}
class SQLAggregateStore implements WriteableAggregateStore
class InMemoryAggregateStore implements WritableAggregateStore
class HttpAggregateStore implements AggregateStore
Aggregates and aggregate stores
SELECT s.id, s.value
FROM aggregates s
INNER JOIN (
SELECT MAX(sequence) ms, id
FROM aggregates
WHERE sequence <= ?
GROUP BY id
) t
ON s.id = t.id
AND s.sequence = t.ms
WHERE id = ?
INSERT INTO aggregates (sequencee, id, valuee)
VALUES (?, ?, ?)
Testing
event store
<events>
<event>
<id>fnsdjFD94d</id>
<type>sample-event</type>
<value>some-event</value>
</event>
</events>
{
"state": "some-event"
}
supply
assert
Test-automation
test: example
timeout: 10000
applications:
- eventstore
- identity-management
- folkeregister-export
given:
- application: eventstore
POST: ./some-events.xml
when:
- application: folkeregister-export
GET: /info/sequence
text: 10
then:
- application: folkeregister-export
GET: /part/947652
json: ./some-result.json
example.yml
~$ part-test-cli example.yml
PartTestRunner extends JUnitRunner
/events/1380/events/0/events/1000
Polling or pushing events
event store
event store
/socket/0
require 1000
event store
/events/1320
/udp/broadcast
Interval-polling:
- Adds network overhead
- Interval adds latency
- Serves as a heartbeat
- Simple and works well with few consumers
Websockets:
- Allows for reactive programming
- Fast processing can break micro-batching
- Optimizes for low-latency
- Slow in instable networks
Broadcast-triggered polling:
- Avoids long-lasting connections
- Scales better with consumer count
- Adds latency in instable networks
- Broadcast can be delayed under high load
Things to mention
1. We do not process single events.
Instead of real streaming, we apply "micro-batching".
Without, HTTP calls between microservices would hang up our system.
2. We do not use transactions.
In case of an error, we simply reset an aggregate store the last known sequence id.
This also alows us to use multiple databases such as Oracle/Elasticsearch without XA.
3. We have cut some corners.
To save time and money, not everything presented is implemented at Skatteetaten.
4. Asynchronicity and eventual-consistency are optional concepts.
By processing messages as they arrive, it is possible to implement an event-sourced
application without eventual consistent state.
Concept
Implementation
Operation
The event store as a bottleneck
writer 1
writer 2 reader 2
reader 1
event store
Scaling reads by event store replication
writer 1
writer 2
reader 2
reader 1
event store
event store
mirror 2
event store
mirror 1
reader 4
reader 3
Scaling reads by splitting reader responsibility
writer 1
writer 2
reader 1
(aggregator)
event store
reader 2
(aggregator)
reader 1(API)
reader 1 (API)
reader 2 (API)
reader 2 (API)
Scaling writes via buffers (with priority)
writer 1
writer 2 reader 2
reader 1
event store
buffer 2
buffer 1
Share-nothing event store
writer 1
writer 2 reader 2
reader 1
event store
(key space 1)
event store
(key space 2)
broadcast
expiration
requests
redirect
requests
sequence mod: 1
sequence mod: 0
ks1
ks1
ks2
ks2
Observing event-processing of distributed services
Things to mention
1. Full partitioning (sharing) conflicts with a total store order of all events.
Message log systems such as Kafka use partitions to achieve performance.
This might hinder future services that want to aggregate events of different partitions.
2. Beware of time-based sequencing.
Databases like MongoDB generate ordering ids based on the system clock.
Timers are not fully reliable, even when using NTP.
3. We split reader responsibility into aggregator and API for blue/green deployment.
As we do not require full versioning of all components, a parallel deployment of an
application allows to recreate a "fixed" version that can replace an older version.
4. Operating microservices requires a significant amount of resources.
HTTP and (un-)marshalling are expensive operations. While enabling scalability,
distributed architecture requires a baseline of additional resources to match the
level of centralized applications.
http://rafael.codes
@rafaelcodes
http://documents4j.com
https://github.com/documents4j/documents4j
http://bytebuddy.net
https://github.com/raphw/byte-buddy

Contenu connexe

Plus de Rafael Winterhalter

An introduction to JVM performance
An introduction to JVM performanceAn introduction to JVM performance
An introduction to JVM performanceRafael Winterhalter
 
Making Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMMaking Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMRafael Winterhalter
 
Understanding Java byte code and the class file format
Understanding Java byte code and the class file formatUnderstanding Java byte code and the class file format
Understanding Java byte code and the class file formatRafael Winterhalter
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVMRafael Winterhalter
 

Plus de Rafael Winterhalter (8)

Migrating to JUnit 5
Migrating to JUnit 5Migrating to JUnit 5
Migrating to JUnit 5
 
The Java memory model made easy
The Java memory model made easyThe Java memory model made easy
The Java memory model made easy
 
An introduction to JVM performance
An introduction to JVM performanceAn introduction to JVM performance
An introduction to JVM performance
 
Java byte code in practice
Java byte code in practiceJava byte code in practice
Java byte code in practice
 
Making Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMMaking Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVM
 
Unit testing concurrent code
Unit testing concurrent codeUnit testing concurrent code
Unit testing concurrent code
 
Understanding Java byte code and the class file format
Understanding Java byte code and the class file formatUnderstanding Java byte code and the class file format
Understanding Java byte code and the class file format
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVM
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Event-Sourcing Microservices on the JVM

  • 1. Event-Sourcing Microservices on the JVM at the Norwegian Tax Authority
  • 3. 0 Accountcredit 100 credit 50 debit 200 credit 150 2018/05/02 14:30 credited 100 to account 2018/05/02 18:15 credited 50 to account 2018/05/05 10:00 debitted 200 from account 2018/05/06 12:00 credited 150 to account 100150-50100 Logging events
  • 4. 0 Accountcredit 100 credit 50 debit 200 credit 150 100150-50100 0 Audit 100 150 -50 State auditing
  • 5. 0 Snapshot 100150-50100 credit 100 credit 50 debit 200 credit 150 credited 100 Events credited 50 debited 200 credited 150 Event sourcing
  • 6. 0 Snapshot 100150-50100 credit 100 credit 50 debit 200 credit 150 credited 100 Events credited 50 debited 200 credited 150 Event sourcing: resetting snapshots
  • 7. 0 Snapshot 100150300 debit 200 credit 150 credited 100 Events credited 50 overrun credited 150 credit 100 credit 50 Event sourcing: events and commands
  • 8. Command query [responsibility] segregation (CQ[R]S) commands do not query state queries do not change state
  • 9. Who is paying taxes in Norway? passport foreigner id citizen id international locally registered own employ ownrelate
  • 10. “Partsregister”: tracking taxable entities in Norway folkeregister enhetsregister Toll Skatteetaten event store id (part) management searchdetails relationshipsexports legacy register This is a schematic view only.
  • 11. The promise of event-sourcing and our experience • Event-sourcing allows you to easily change snapshot representation • unless you did not sufficiently future-proof event capture • Event-sourcing makes snapshots redundant by replaying events unless the event-processing code changes • Event-sourcing implies full auditability of your application • unless an error happens during command-to-event processing • Event-sourcing offers an easy way of debugging applications • unless events are trivial compared to command input • Event-sourcing is an easy gateway to share-nothing architecture • but only if you could shard your data in the first place Disclaimer: our approach could be described as a combination of event sourcing and “command sourcing” with limited capability to scaling writes. But for us, this solution works great!
  • 12. 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... folkeregister Why “command sourcing”? The presented file formats are simplified for didactical reasons. event store { "fnr": "11059955214", "name": "Some Man" } { "fnr": "11059955214", "name": "gome Man" } { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" }
  • 13. 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... folkeregister Persist events for mistakes that need explicit correction The presented file formats are simplified for didactical reasons. event store { "fnr": "11059955214", "pnr": "9950174" }
  • 14. { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } Event-dependent state and sequence numbers { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } /part/9950174/1 /part/9950174 event store 11059955214 Some Man Oslo 17028421937 Some Woman Drammen 11059955214 Some Man Drammen sequence: 1 sequence: 2 sequence: 3 11059955214 Some Man Oslo 11059955214 Some Man Drammen /part/9950174/3/part/9950174/2
  • 15. /rel/7573509 Using sequence numbers for dealing with eventual consistency X-Sequence: 2 { "owner": "9950174" } /part/9950174/2 { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } event store last-event: 3 last-event: 2
  • 16. /rel/7573509 Using sequence numbers for dealing with eventual consistency X-Sequence: 3 { "owner": "9950174" } /part/9950174/3 BAD REQUEST: { "sequence": "2" } event store last-event: 2 last-event: 3
  • 17. 9950174 sequence 3 9950174 sequence 1 { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "grammen" } event store 11059955214 Some Man Oslo 17028421937 Some Woman Drammen 11059955214 Some Man Drammen 11059955214 Some Man Oslo 11059955214 Some Man Drammen Publishing thin change feeds to expose application state 294851 sequence 2 { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } /part/9950174/1 /part/294851/2 /part/9950174/3 9950174 sequence 3
  • 18. Revisioning aggregates for idempotency { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "grammen" } { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } /part/9950174/1 /part/294851/2 /part/9950174/3 { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } diff diff diff 17028421937 Some Woman Drammen 11059955214 Some Man Oslo 11059955214 Some Man Drammen reprocess /part/9950174/3/1 /part/9950174/3[/2] /part/294851/2/1/part/9950174/1/1 event store
  • 19. 9950174 sequence 3 9950174 sequence 1 revision 1 { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "grammen" } Publishing revisions in a feed 294851 sequence 2 revision 1 { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } /part/9950174/1/1 /part/294851/2/1 /part/9950174/3/1 9950174 sequence 3 revision 1 9950174 sequence 3 revision 2 { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } /part/9950174/3/2
  • 20. Using the event store as a single source of truth event store read commands read events recover/replicate events write events Advantages of "command sourcing": 1. Self-healing state after any bug fix without any user management. 2. Only command-to-event mapping is domain-specific code. 3. Minimal probability to misinterpret events after updates. Downside: command-to-event processing must be stateless to allow reprocessing. Revision-sensitive event observers can often remedy this limitation.
  • 21. Event UIDs for idempotency of write operations 1105995521418Some Man 1047100000Oslo 1503XXXXX185719... 1702842193749Some Woman 9755384654Drammen 9456 A529184... 1105995521494 00000Drammen 0000XXXXX000000... event store Fnr: 11059955214 Event id: 18 Name: Some Man City: Oslo Fnr: 17028421937 Event id: 49 Name: Some Woman City: Drammen Fnr: 1105995521 Event id: 94 City: Drammen folkeregister fr:fileABC:1 fr:fileABC:2 fr:fileABC:3 Unique keys can also be chosen as UUIDs for live commands.
  • 22. /part/749572 { "name": "Some Company" } { "name": "Some Company", "last_id": "gf01Ha" } { "name": "Some Company", "last_id": "df57Ha" } Part: 749572 Name: Other Name Part: 749572 Name: Other Name Last id: gf01Ha Part: 749572 Name: Yet Another Name Part: 749572 Name: Yet Another name Last id: gf01Ha Using event UIDs as optimistic locks event store 46sjGF df57fF /part/749572 Part: 749572 Name: Other Name Last id: gf01Ha Part: 749572 Name: Yet Another name Last id: gf01Ha df57fF /part/749572/df57fF /part/749572/46sjGF Event UIDs are non-numeric to avoid confusion with sequence numbers.
  • 23. Deleting events and compaction events Why would you want to delete events? 1. Because you want. Storage space is not free after all. 2. Because you should. Storing obsolete personal data makes you a target for attackers and is immoral. 3. Because you have to. Laws like the GDPR demand physical erasure.
  • 24. Deleting events with tombstones event store 17028421937 Some Woman Drammen 11059955214 Some Man Oslo 11059955214 Some Man Drammen 11059955214 [tombstone] { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } /part/9950174/1 /part/294851/2 /part/9950174/3 Tombstones must not be deleted themselves to allow for propagation to all services. For this reason, it is crucial to choose primary identificators that do not contain personal data (unlike a fødselsnummer). Ideally, an internal, synthetic identificator is used as a proxy for each personal identificator.
  • 25. Compacting events with compaction events 17028421937 Some Woman Drammen 11059955214 Some Man Oslo 11059955214 Some Man Drammen 11059955214 Some Man Drammen [compaction: 3] { "fnr": "11059955214", "name": "Some Man", "city": "Drammen" } { "fnr": "11059955214", "name": "Some Man", "city": "Oslo" } { "fnr": "17028421937", "name": "Some Woman", "city": "Drammen" } /part/9950174/1 /part/294851/2 /part/9950174/3 { "fnr": "11059955214", "name": "Some Man", "city": "Oslo", "compacted": "3" } /part/9950174/1 Can be represented by same database entity. event store
  • 27. What is out there? API-wapper for MongoDB. Originates from the .NET space. Java client but Scala-oriented. Java framework for CQRS. Strict command and event seperation. Support for JDBC-integration. Append-only database. Only recently published. DIY at Skatteetaten. Reasons for choice: 1. Performance. Streaming has a high overhead for mass processing. Need for microbatching to allow for microservice orchestration. 2. Complexity Event sourcing is not yet mainstream. APIs feel often immature. Event stores often aim for distributability at the cost of simplicity. 3. Loose command-to-aggregate mapping Many frameworks assume that there exists an obvious mapping of any command to an aggregate.
  • 28. class Event { long sequence; // 0 if not set String uid; String id; String type; // XML namespace id String value; // XML } Events and event stores interface EventStore { Stream<Event> read(long afterSequence); ClosableConsumer<Event> write(); } EventStore source, target; try (Stream<Event> stream = source.read(0); ClosableConsumer<Event> consumer = target.write()) { stream.forEach(consumer); }
  • 29. class SQLEventStore implements EventStore class InMemoryEventStore implements EventStore class HttpEventStore implements EventStore Events and event stores LOCK TABLE events; INSERT INTO events (sequence, uid, id, type, value) SELECT seq.NEXTVAL, ?, ?, ?, ? FROM dual WHERE ? NOT IN (SELECT uid FROM events) SELECT * FROM events WHERE seq > 0 FETCH FIRST 1000 ROWS ONLY SELECT /*+ index(events seq) */ * FROM events WHERE seq > 0 FETCH FIRST 1000 ROWS ONLY
  • 30. interface AggregateStore { Optional<String> read(String id, long sequence); } interface WriteableAggregateStore extends AggregateStore { void write(String id, long sequence, String aggregate); } Aggregates and aggregate stores EventStore source; AggregateStore target; try (Stream<Event> stream = source.read(0)) { stream.forEach(event -> { String aggregate = target.read(event.id, event.sequence) .map(aggregate -> Domain.updateAggregate(aggregate, event.value)) .orElse(() -> Domain.newAggregate(event.value)); target.write(event.id, event.sequence, aggregate); }); }
  • 31. class SQLAggregateStore implements WriteableAggregateStore class InMemoryAggregateStore implements WritableAggregateStore class HttpAggregateStore implements AggregateStore Aggregates and aggregate stores SELECT s.id, s.value FROM aggregates s INNER JOIN ( SELECT MAX(sequence) ms, id FROM aggregates WHERE sequence <= ? GROUP BY id ) t ON s.id = t.id AND s.sequence = t.ms WHERE id = ? INSERT INTO aggregates (sequencee, id, valuee) VALUES (?, ?, ?)
  • 33. Test-automation test: example timeout: 10000 applications: - eventstore - identity-management - folkeregister-export given: - application: eventstore POST: ./some-events.xml when: - application: folkeregister-export GET: /info/sequence text: 10 then: - application: folkeregister-export GET: /part/947652 json: ./some-result.json example.yml ~$ part-test-cli example.yml PartTestRunner extends JUnitRunner
  • 34. /events/1380/events/0/events/1000 Polling or pushing events event store event store /socket/0 require 1000 event store /events/1320 /udp/broadcast Interval-polling: - Adds network overhead - Interval adds latency - Serves as a heartbeat - Simple and works well with few consumers Websockets: - Allows for reactive programming - Fast processing can break micro-batching - Optimizes for low-latency - Slow in instable networks Broadcast-triggered polling: - Avoids long-lasting connections - Scales better with consumer count - Adds latency in instable networks - Broadcast can be delayed under high load
  • 35. Things to mention 1. We do not process single events. Instead of real streaming, we apply "micro-batching". Without, HTTP calls between microservices would hang up our system. 2. We do not use transactions. In case of an error, we simply reset an aggregate store the last known sequence id. This also alows us to use multiple databases such as Oracle/Elasticsearch without XA. 3. We have cut some corners. To save time and money, not everything presented is implemented at Skatteetaten. 4. Asynchronicity and eventual-consistency are optional concepts. By processing messages as they arrive, it is possible to implement an event-sourced application without eventual consistent state.
  • 37. The event store as a bottleneck writer 1 writer 2 reader 2 reader 1 event store
  • 38. Scaling reads by event store replication writer 1 writer 2 reader 2 reader 1 event store event store mirror 2 event store mirror 1 reader 4 reader 3
  • 39. Scaling reads by splitting reader responsibility writer 1 writer 2 reader 1 (aggregator) event store reader 2 (aggregator) reader 1(API) reader 1 (API) reader 2 (API) reader 2 (API)
  • 40. Scaling writes via buffers (with priority) writer 1 writer 2 reader 2 reader 1 event store buffer 2 buffer 1
  • 41. Share-nothing event store writer 1 writer 2 reader 2 reader 1 event store (key space 1) event store (key space 2) broadcast expiration requests redirect requests sequence mod: 1 sequence mod: 0 ks1 ks1 ks2 ks2
  • 42. Observing event-processing of distributed services
  • 43. Things to mention 1. Full partitioning (sharing) conflicts with a total store order of all events. Message log systems such as Kafka use partitions to achieve performance. This might hinder future services that want to aggregate events of different partitions. 2. Beware of time-based sequencing. Databases like MongoDB generate ordering ids based on the system clock. Timers are not fully reliable, even when using NTP. 3. We split reader responsibility into aggregator and API for blue/green deployment. As we do not require full versioning of all components, a parallel deployment of an application allows to recreate a "fixed" version that can replace an older version. 4. Operating microservices requires a significant amount of resources. HTTP and (un-)marshalling are expensive operations. While enabling scalability, distributed architecture requires a baseline of additional resources to match the level of centralized applications.