This document provides an overview of best practices for building data streaming APIs. It discusses various techniques for implementing streaming such as TCP/UDP multicast, HTTP streaming, WebSocket, and push notifications. It also covers challenges like protocol fallback, API design, fault tolerance, security, and data optimization. Finally, it lists several streaming libraries, tools and cloud services that can be used to build streaming applications and APIs.
2. 2CONFIDENTIAL
ABOUT ME
Java Backend engineer
Speaker at Java Tech Talks, SEC Online,
CMCC Tech Talks, IT Week
I’m interested in
Complex Java backend, SOA, databases
High load, fault-tolerant, distributed systems
KANSTANTSIN SLISENKA
EPAM Systems, Lead Software Engineer
6. 6CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
empty response
request
response
7. 7CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
request
response new data
client server
data
source
- Not real-time
• No or less useless calls
LONG POLLING
request
empty response
request
response
8. 8CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
request
response new data
client server
data
source
subscribe
send data
send data
send data
new data
new data
new data
client server
data
source
- Not real-time
• No or less useless calls
• Real time
• Long held connection
LONG POLLING STREAMING
request
empty response
request
response
10. 10CONFIDENTIAL
Streaming on hardware and
network protocol level
• UDP multicast
• TCP reliable multicast protocols
– Cisco PGM and others
• The most effective network
utilization
TCP/UDP MULTICAST
http://www.java67.com/2016/09/difference-between-tcp-and-udp-in-java.html
11. 11CONFIDENTIAL
1. Browser apps became more
popular
• No full TCP/UDP support in browsers
2. Host and network virtualization
• Virtual and hardware networks are different
• No benefit from multicast as routers are not
aware of virtual hosts
WHY TCP/UDP MULTICAST BECAME LESS POPULAR
3. Firewall/proxy restrictions
• Usually only HTTP protocol not restricted in
corporate networks
4. Poor multicast support by
hosting providers
• Multicast is being offered for additional cost
• Poor quality of service
13. 13CONFIDENTIAL
COMET / HTTP STREAMING
BENEFITS DRAWBACKS
1. Using only web-technologies
– No more JRE, flash, browser plugins on
client side
1. HTTP browser limitation
– max 6-8 parallel calls
– workaround with domain shading, multiplexing
2. Poor client and server performance
– We are using HTTP protocol not in proper way
3. Proxy/firewall/browser kills
request by timeout
4. Need to handle disconnects
Should be used as
fallback only!
14. 14CONFIDENTIAL
Browser
EVENT SOURCE API: TURNING HACK INTO STANDARD
• Standard JavaScript API
• No more hidden IFRAMEs
• Browser automatically reconnects
server
Long-held HTTP call
One way: from server to browser
Still poor server
performance
15. 15CONFIDENTIAL
TCP
HTTP
WEB SOCKET: TCP IN BROWSER
serverclient
WebSocket frames
WebSocket frames
HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Origin: http://site.com
1. HTTP handshake
2. Upgrade response, “switch
protocols” header
3. Switch to TCP (ports 80/443)
16. 16CONFIDENTIAL
• Real-time P2P connection
between browsers
• Data, audio, video
• STUN server needed for
initial handshake
https://webrtc.org/
WEB-RTC: UDP + P2P IN BROWSER
STUN
server
I AM
10.0.10.1
I AM
10.0.25.40
DATA, VOICE, VIDEO
10.0.10.1 10.0.25.40
HE IS
10.0.25.40
HE IS
10.0.10.1
18. 18CONFIDENTIAL
• Google Cloud Messaging: Android/Chrome
• Apple Push Notification Service: iPhone, iPad,
Safari
• Other services: Microsoft, Blackberry, …
PUSH NOTIFICATIONS
your
back-end
1. GET TOKEN
2. SEND TOKEN
4. SEND NOTIFICATION
5. SEND NOTIFICATION
3. STORE TOKEN
Messaging
service
VENDOR SERVICES
Not a replacement for web-sockets!
https://www.urbanairship.com/push-notifications-explained
19. 19CONFIDENTIAL
COMPARATION OF STREAMING IMPLEMENTATIONS
TCP/UDP multicast
HTTP Streaming
COMET
Event Source
API
WebSocket Web-RTC
Use in
browser
NO YES YES YES YES
Use not in
browser
YES
YES (makes sense
for browser apps)
NO YES YES
Technical
details
Custom protocols over
TCP/UDP
Long HTTP calls Long HTTP calls
HTTP for handshake
with subsequent
upgrade to TCP
P2P UDP
STUN server to
exchange IP addresses
Benefits
Hardware and
protocol level – most
effective network
usage
Only web technology
used
Easier to use then
COMET
All benefits from TCP
and browser apps
All benefits from TCP
and browser apps
Drawbacks
Doesn’t work in
browser
Can be blocked by
proxy/firewall
Negative impact to
client and server
performance
Negative impact to
server performance
Needs fallback to
polling if disabled by
firewall/proxy
Needs intermediate
discovery STUN server
21. 21CONFIDENTIAL
DATA STREAMING CHALLENGIES
Protocol fallback1
API design2
Fault-tolerance3
Security4
Using schemas5
Sending deltas (snapshot-update)6
Data merging7
Replaceable buffer8
ARCHITECTURE OPTIMIZATION
22. 22CONFIDENTIAL
1. PROTOCOL FALLBACK
• Client don’t support WebSocket
• Firewall/proxy issues
• Unstable network connection
Automatic switch to
other protocol
1. Try WebSocket
2. Then HTTP streaming
3. Then Long polling*
4. Then Polling*
* Not all applications can tolerate to such a large latency
23. 23CONFIDENTIAL
2. STREAMING API DESIGN
onMessage Publish-Subscribe ORM-style
Development and support complexity, performance
Lots of if-else blocks
Very hard to maintain
Logical notion of subscription
Trade-off between level of abstraction
and performance
High level of abstraction
We don’t know what exactly happens
under API calls
Data structures complexity
24. 24CONFIDENTIAL
3. FAULT-TOLERANCE
CLIENT/CONNECTION IS DOWN SERVER IS DOWN
server
client
disconnect
reconnect
client
context
server
client
heartbeat
Session/context alive timeout
client
context
Try restore context + send difference (preferable)
Or request data again (HTTP/snapshot + WebSocket)
Server 1
client
Server 2
disconnect
Connect other server
Try restore context
Or request data again
client
context
client
context
If streaming no longer works - switch to polling
We are no longer stateless!
25. 25CONFIDENTIAL
4. SECURITY
Request-response Streaming
Protocol HTTPS WSS
Authentication When HTTP session started
Authorization Each client request Beginning of the connection
Log-off
Invalidate access token and
session
Invalidate access token and
session
Terminate WebSocket
connection
26. 26CONFIDENTIAL
5. USING SCHEMAS
Field Type
Temp Decimal
Pressure Decimal
Status CONNECTED=1,
DISCONNECTED=2
server
client
25.5 | 751 | 1
Use schema
Use schema
Need to somehow manage
different schema versions
Schema version = 1
Don’t send field names
in each message
{
sensorData: {
temp: 25.5,
pressure: 751,
status: CONNECTED
}
}
28. 28CONFIDENTIAL
7. DATA GROUPING
Time Price Quantity
12:40:00.100 121.60 5
12:40:00.150 121.95 10
12:40:00.600 121.70 20
12:40:01.100 121.75 50
12:40:01.900 121.60 100
Time Max price (MAX) Total quantity (SUM)
12:40:00 121.95 35 (5+10+20)
12:40:01 121.75 150 (50+100)
Merge multiple messages into one for reducing bandwidth and frequency
clientserver
29. 29CONFIDENTIAL
8. WRITE-BEHIND BUFFER
Modifiable buffer
Time Temperature SensorID
12:41:00 24 c* 1
Time Temperature SensorID
12:40:00 23 c 1
12:40:00 30 c 2
UPDATE
• Data has bot been sent
• But still in the buffer
client
31. 31CONFIDENTIAL
Implements fallback
– WebSocket
– EventSource
– COMET
– Hidden IFRAME
– Polling
SOCK JS LIBRARY
• Integration with Spring
• Multiplexing support
https://github.com/sockjs/websocket-multiplex
32. 32CONFIDENTIAL
• Client and server (Java) components
• Transparently supports
– WebSockets
– Server Sent Events,
– Long-Polling,
– HTTP Streaming (Forever frame)
• References
– https://github.com/Atmosphere/atmosphere
– http://async-io.org/tutorial.html
ATMOSPHERE JAVA FRAMEWORK
33. 33CONFIDENTIAL
• Connects to external data
sources
• Provides data to LS server
• Per user/subscription
• Security and permissions
• Bandwidth/frequency limitations
• Data schemas
LIGTSTREAMER SELF-HOSTED SERVER
DATA ADAPTER
METADATA ADAPTER
• Self-hosted server
• We need to implement and deploy adapters
35. 35CONFIDENTIAL
Cloud NoSQL data storage
• Data is automatically synced to all
connected devices
• Covers many issues
– Failover
– Protocol fallback
– Network
– Scalability
– Monitoring
– and many other
• Handles complexity behind SDK
GOOGLE FIREBASE CLOUD SERVICE
36. 36CONFIDENTIAL
1. Real-time apps are de facto standard now
2. Use streaming, fallback to long polling or polling
3. Take advantage from TCP/UDP in browser (WebSocket, Web-RTC)
4. Streaming API is fully statefull
5. Keep in mind optimization techniques when architecting streaming API
6. Use battle-tested tools and products
CONCLUSION
37. 37CONFIDENTIAL
Real-time web technologies overview
– https://www.leggetter.co.uk/
Data streaming frameworks and services
– List https://www.leggetter.co.uk/real-time-web-technologies-
guide
– Lightstreamer http://www.lightstreamer.com/
– SockJS https://github.com/sockjs
– PubNub: pubnub.com
– Firebase: https://firebase.google.com/
– Atmosphere: https://github.com/Atmosphere
WebSocket
– https://samsaffron.com/archive/2015/12/29/websockets-caution-
required
Server-side events vs WebSockets
– http://streamdata.io/blog/push-sse-vs-websockets/
REFERENCES
Server-side events
– http://www.html5rocks.com/en/tutorials/eventsource/basics/
Push notifications
– https://www.urbanairship.com/push-notifications-explained
Push notification services with free plans
– https://onesignal.com/
– https://clevertap.com/
– https://goroost.com/
HTTP/2
– https://daniel.haxx.se/blog/2014/04/26/http2-explained/
– https://http2.github.io/
– https://tools.ietf.org/html/rfc7540
– Explanation by Daniel Stenberg, member of IETF HTTPbis working
group, developer of Firefox
– https://bagder.gitbooks.io/http2-explained/content/