2. 供TSMC教育訓練⽤
Who Am I
• 邱宏瑋(HungWei Chiu)
• Cloud Native Taiwan User Group 志⼯
• 個⼈網站: https://hwchiu.com
• 個⼈粉絲⾴: 矽⾕⽜的耕⽥筆記
• 著有書籍「矽⾕⼯程師教你 Kubernetes, 史上最全 CI/CD 中⽂應⽤指南」
• Kubernetes 企業顧問與教育訓練
• Microsoft MVP (Cloud and Data Center Management)
2
3. Observability
• Observability can typically be categorized into three areas:
• Metrics
• Logging
• Tracing
• Each area has its own technology stack to facilitate effective monitoring
and understanding of systems and applications.
3
5. Observability
• All technology stacks share very similar components in their work
f
low
• Application
• Collector
• Processor/Analyzer
• Visualizer
• These components play vital roles in the observability process, helping to
gather, analyze, and visualize data from various sources to gain valuable
insights into system performance and behavior.
5
6. Observability
• Potential Challenge
• Deployment/Installation
• Architecture
• Con
f
iguration
• Turning
• Troubleshooting
• User Interface
• User Management + Authentication/Authorization
• Multi Tenancy
6
7. Observability
• We need to provide a user-friendly UI for developers to view metrics, logs,
and tracing data.
• Grafana, Kibana, and OpenTelemetry
• Developers should be able to use a single account (username/password) to
access all these services.
• System administrators should have an effortless way to manage user
permissions, based on user/group/role assignments, and more.
• Additionally, easy integration with external user management services
like LDAP, OIDC, Keycloak, AzureAD, Google OAuth
7
8. Grafana Lab
• Grafana Labs has expanded its horizons and actively developed multiple open-
source projects in the Observability (O11y) domain, extending beyond Grafana
itself.
• Grafana serves as the uni
f
ied UI for various components
• Metrics/Logs/Tracing
• Mimir/Loki+Promtail/Tempo
• Agent
• Pro
f
iling
• Phlare -> Pyroscope
8
12. Prometheus
• Not a long-term storage solution
• Local Disk Only
• Not designed to be scaled horizontally
• Have to leverage on remote_write/remote_read
• People usually opt for other solutions in production environment
12
15. Grafana Mimir
• Similar to other solutions
• Cortex, VictoriaMetrics, Thanos
• Long term storage solution for Prometheus
• Based on object storage
• Same as Cortex/Thanos
• VictoriaMetrics is block storage based
• High Availability
• Scalability (Microservice-based architecture)
15
17. Grafana Agent
• Grafana Lab wants to extends its ability
• The new OSS Grafana agent, which has implemented all function same as
Prometheus Agent.
• v0.35.0
• PoC
f
irst before applying to production environment.
• Supports all Prometheus CRD, you can easily migrate from Prometheus
Operator.
17
19. Metrics Mimir
19
• Mimir has a microservice-based architecture, has multiple horizontally
scalable services that can run separately and in parallel.
• All functions are complied into a single binary and we can specify what
function(components) that service should behave as.
• Support 3 different deployment modes
• Monolithic
• Read-Write
• Microservice mode
27. Logging
• EFK/ELK/Opensearch is a widely recognized logging solution stack.
• However, for most users, their primary requirement is log collection and
f
iltering,
which isn't the main focus of Elasticsearch's extensive capabilities.
• Managing Elasticsearch can be quite complex due to its clustered nature.
• If your logging needs don't involve index-based analysis and you seek a more
straightforward solution, Grafana Loki is worth considering.
• Grafana's uni
f
ied GUI serves as the central interface for all aspects
• Loki can ef
f
iciently handle logging data without the complexities of a full
Elasticsearch setup. This makes it an appealing choice for logging-speci
f
ic use
cases.
27
29. Logging
• Loki provides support for various log collectors, including
f
luent-bit,
f
luentd, logstash, and others
• Grafana offers its lightweight log collector called Promtail.
• Grafana agent is also capable of collecting log messages.
29
32. Loki
• Loki is a horizontally scalable, high available, multi tenant log aggregation system.
• It’s designed to be very cost effective and easy to operate
• Object storage
• It doesn’t index the content of the logs, only the set of labels.
• Has its own LogQL query language.
• Support 3 different deployment mode
• Monolithic
• Read-Write
• Microservice mode
32
33. Loki
• Very similar to the Mimir, Loki has the following components
• Distributor
• Ingestor
• Query-frontend
• Querier
• If you understand how Mimir works, you can quickly grasp how Loki
functions as well
33
36. Tempo
• People sometimes got confused about Jaeger and Otel components
• More projects, more complexity
• Grafana has developed the solution for distributed tracing, Tempo
• Tempo is a cost-effect, easy to operate, high-volume distributed tracing
backend.
• Object storage for long term storage, Redis/Memcached for increased
performance.
• microservice-based architecture
36
37. Tempo
• Support tracing from
• Jaeger
• Zipkin
• OpenTelemetry
• Flexible, you can chain any components in your data path.
• Highly integrated with Grafana, Mimir, Prometheus and Loki
• Couple metrics, logs and tracing in the single GUI to enhance
troubleshooting experience.
37
39. Tempo
• Tempo’s architecture is similar to Mimic and Loki, has the following
components
• Distributor
• Ingestor
• Query-Frontend
• Querier
• Compactor
• Metrics-generator
39
44. Others
• Continuous Pro
f
iling
• Grafana Phlare
• Was archived after Grafana acquired Pyroscope on 2023-03-15
• Maybe we will see the periscope solution in the Grafana ecosystem
soon.
44