By Kevin Schmidt (Head of Data Science at Mind Candy) Luis Vicente (Senior Data Engineer at Mind Candy) For mobile games, constant tweaks are the difference between success and failure. Data and analytics have to be available in real-time, but calculating, for example, uniqueness or newness of a data point requires a list of seen data points - both memory intensive and tricky when using real-time stream processing like Spark Streaming. Probabilistic data structures allow approximation of these properties with a fixed memory representation, and are very well suited for this kind of stream processing. Getting from the theory of approximation to a useful metric at a low error rate even for many millions of users is another story. In our talk we will look at practical ways of achieving this: which approximation we used for selection of useful metrics, why we picked a specific probabilistic data structure, how we stored it in Cassandra as a time series and how we implemented it in Spark Streaming.