SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
12   2   4
12   2   4
Working at NHN Japan
            we are hiring!


12   2   4
What we are doing about logs with fluentd


                   data mining
                      reporting
                  page views, unique users,
                   traffic amount per page,
                                 ...
12   2   4
What we are doing about logs with fluentd




             super large scale
               'sed | grep | wc'
                      like processes

12   2   4
What fluentd? (not Storm, Kafka or Flume?)



                Ruby, Ruby, Ruby! (NOT Java!)
                   we are working in lightweight language culture


                  easy to try, easy to patch
                  Plugin model architecture
             Builtin TimeSlicedOutput mechanism


12   2   4
What I talk today


                      What we are trying with fluentd

                  How we did, and how we are doing now

             What is distributed stream process topologies like?

                What is important about stream processing

                          Implementation details

                                (appendix)


12   2   4
Architecture in last week's presentation
                                                                                      archive
                                                      deliver                     server(scribed)
                                                                                     archive server
                                                     (scribed)
                                                    deliver server                        RAID
                      server                                                            (scribed)
                                                      (scribed)                   Large volume RAID
                   server



                                    send data both archive servers and
                                       Fluentd workers (as stream)
                                                                              import past logs and convert
                                                                                 on demand (as batch)

             Hadoop
              Hadoop                          Hadoop
               Hadoop                          Hadoop
                                                Hadoop                                     Shib
             Cluster
                Hadoop
              Cluster
                 Hadoop                       Cluster
                                                 Hadoop
               Cluster
                  Fluentd                      Cluster
                                                  Hadoop
                                                Cluster
                                                   Hadoop                               Hadoop Hive
                Cluster
                 Cluster                         Cluster
                  Cluster                         Cluster
                                                   Cluster                               Web Client


                                                                  aggregation queries
                  convert logs as structured data
                                                                      on demand
                   and write HDFS (as stream)

12   2   4
Now
                                                                 archive
                                              deliver        server(scribed)
                                                                archive server
                                             (scribed)
                                            deliver server           RAID
                      server                                       (scribed)
                                              (Fluentd)      Large volume RAID
                   server




             Hadoop
              Hadoop                     Hadoop
               Hadoop                     Hadoop
                                           Hadoop                  Shib
             Cluster
                Hadoop
              Cluster
                 Hadoop                  Cluster
                                            Hadoop
               Cluster
                  Fluentd                 Cluster
                                             Hadoop
                                           Cluster
                                              Hadoop            Hadoop Hive
                Cluster
                 Cluster                    Cluster
                  Cluster                    Cluster
                                              Cluster            Web Client


                               Fluentd
                               Watcher

12   2   4
Fluentd in production service




                             10 days



12   2   4
Scale of Fluentd processes




                from 127 Web Servers
                     146 log streams


12   2   4
Scale of Fluentd processes




                 70,000 messages/sec
                          120 Mbps
                      (at peak time)


12   2   4
Scale of Fluentd processes




                         650 GB/day
                          (non-blog: 100GB)




12   2   4
Scale of Fluentd processes




                 89 fluentd instances
                                on
                  12 nodes (4Core HT)


12   2   4
We can't go back.




                             crouton by kbysmnr
12   2   4
What we are trying with fluentd

                      log conversion

                          from: raw log
               (apache combined like format)

             to: structured and query-friendly log
             (TAB separated, masked some fields,
                     many flags added)
12   2   4
What we are trying with fluentd

                               log conversion
 99.999.999.99 - - [03/Feb/2012:10:59:48 +0900] "GET /article/detail/6246245/ HTTP/1.1" 200
  17509 "http://news.livedoor.com/topics/detail/6246245/" "Mozilla/4.0 (compatible; MSIE 8.0;
  Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
         3.0.30729; Media Center PC 6.0; InfoPath.1; .NET4.0C)" "news.livedoor.com"
                              "xxxxxxx.xx.xxxxxxx.xxx" "-" 163266



        152930 news.livedoor.com /topics/detail/6242972/ GET 302 210 226 - 99.999.999.99
       TQmljv9QtXkpNtCSuWVGGg Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X)
      AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A406 Safari/7534.48.3 TRUE
                            TRUE FALSE FALSE FALSE FALSE FALSE
     hhmmdd vhost path method status bytes duration referer rhost userlabel agent FLAG [FLAGS]
     FLAGS: status_redirection status_errors rhost_internal suffix_miscfile suffix_imagefile agent_bot
                                       FLAG: logical OR of FLAGS
             userlabel: hash of (tracking cookie / terminal id (mobile phone) / rhost+agent)

12   2   4
What we are trying with fluentd

             TimeSlicedOutput of fluentd

             Traditional 'log rotation' is important,
                       but troublesome

                             We want:
             2/3 23:59:59 log in access.0203_23.log
             2/4 00:00:00 log in access.0204_00.log
12   2   4
How we did, and how we are doing now

                        collect


             archive


                        convert


                       aggregate


                         show
12   2   4
How we did in past (2011)

                                  collect (scribed)
                                                  stream
 stream
                                           store to hdfs
             archive (scribed)                                HIGH LATENCY
                                               hourly/daily     time to flush +
                                                              hourly invocation +
                             convert (Hadoop Streaming)          running time
                                                                  20-25mins
                                          on demand
                                 aggregate (Hive)

                                                on demand
                                      show
12   2   4
How we are doing now

                                  collect (Fluentd)
                                                  stream
 stream
             archive (scribed)
                                                              stream convert
                                                  stream    VELY LOW LATENCY
                                 convert (Fluentd)                 2-3 minutes
                                                             (only time to wait flush)
                  store to hdfs
             (over Cloudera's Hoop)
                                          on demand
                                 aggregate (Hive)

                                                on demand
                                      show
12   2   4
crouton by kbysmnr

             break.

12   2   4
What is important about stream processing


                     reasonable efficiency
                          (compared with batch throughput)



             ease to re-run same conversion as
                           batch
                             None SPOF
                 ease to add/remove nodes

12   2   4
Stream processing and batch


             How to re-run conversion as batch
                  when we got troubles?


             We want to use 'just one' converter
             program for both stream processes
                   and batch processes!

12   2   4
out_exec_filter (fluentd built-in plugin)


                      1. fork and exec 'command' program

             2. write data to child process stdin as TAB separated
             fields specified by 'out_keys' (for tag, remove_prefix available)

                3. read data from child process stdout as TAB
              separated fields named by 'in_keys' (for tag, add_prefix
                                         available)

              4. set message's timestamp by 'time_key' value in
               parsed data as format specified by 'time_format'


12   2   4
'out_exec_filter' and 'Hadoop Streaming'

               read from stdin / write to stdout
             TAB separated values as input/output
                            WOW!!!!!!!


             difference: 'tag' may be needed with
                        out_exec_filter
             simple solution: if not exists, ignore.
12   2   4
What is important about stream processing


                     reasonable efficiency
                          (compared with batch throughput)



             ease to re-run same conversion as
                           batch
                             None SPOF
                 ease to add/remove nodes

12   2   4
What is distributed stream process toplogies like?

                         deliver                    archiver           backup
              servers
                             deliver

         servers


     servers
                                worker      worker      worker    worker
                          worker      worker      worker    worker      worker
         servers


             servers                   serializer                      serializer


     Redundancy and load balancing
                                                          HDFS
     MUST be guaranteed anywhere.                      (Hoop Server)

12   2   4
Deliver nodes

                            deliver               archiver            backup
              servers
                                deliver

         servers
                         Accept connections from web servers,
     servers                 Copy messages and send to:
                                   worker      worker      worker    worker
                             worker      worker      worker    worker      worker
         servers              1. archiver (and its backup)
                        2. convert workers (w/ load balancing)
             servers                serializer and ...
                                           3.              serializer



                        useful for casual worker append/remove
                                                 HDFS
                                                      (Hoop Server)

12   2   4
Worker nodes

                              deliver                archiver           backup
              servers Under load balancing,
                                 deliver
                   workers as many as you want
         servers


     servers
                                     worker      worker      worker    worker
                               worker      worker      worker    worker      worker
         servers


             servers                    serializer                      serializer



                                                           HDFS
                                                        (Hoop Server)

12   2   4
Serializer nodes

                                deliver                archiver           backup
         Receive converted data stream from workers,
          servers
                             deliver
                  aggregate by services, and :
         servers
                       1. write to storage(hfds/hoop)
     servers
                                   2. and...
                                       worker      worker      worker    worker
     useful to reduce overhead of storage from many
                                 worker      worker      worker    worker      worker
      servers   concurrent write operations
             servers                      serializer                      serializer



                                                             HDFS
                                                          (Hoop Server)

12   2   4
Watcher nodes

                                    deliver                    archiver           backup
              servers
                                        deliver     Watching data for
         servers                              real-time workload repotings
                                                and trouble notifications
     servers
                                           worker      worker      worker    worker
                                     worker
                                             1. for raw data from delivers
                                                 worker      worker    worker      worker
         servers                        2. for structured data from serializers
             servers                              serializer                      serializer



                          watcher                                    HDFS
                watcher                                           (Hoop Server)

12   2   4
crouton by kbysmnr
                              break.
12   2    4
Implementation details


              log agents on servers                            (scribeline)



               deliver       (copy, in_scribe, out_scribe, out_forward)



                  worker          (in/out_forward, out_exec_filter)



             serializer/hooper                    (in/out_forward, out_hoop)



             watcher      (in_forward, out_flowcounter, out_growthforecast)


12   2   4
log agent: scribeline


             log delivery agent tool, python 2.4, scribe/thrift
                             easy to setup and start/stop
                      works with any httpd configuration updates
                           works with logrotate-ed log files
                      automatic delivery target failover/takeback


                        (NEW) Cluster support
                     (random select from server list)

             https://github.com/tagomoris/scribe_line


12   2   4
From scribeline To deliver

                                             deliver server (primary)
                    category: blog
                  message: RAW LOG       fluentd
                (Apache combined + α)
                                        in_scribe


             scribeline      scribe
             servers                    in_scribe

                                         fluentd
                                             deliver server (secondary)

12   2   4
deliver 01 (primary)


         From scribeline To deliver



                                           deliver 02 (secondary)




             xNN servers




                               x8 fluentd
                                per node
                                           deliver 03 (primary for high throughput nodes)

12   2   4
From scribeline To deliver

                                             deliver server (primary)
                    category: blog
                  message: RAW LOG       fluentd
                (Apache combined + α)
                                        in_scribe


             scribeline
             servers                    in_scribe

                                         fluentd
                                             deliver server (secondary)

12   2   4
deliver node internal routing

         deliver server (primary) x8 fluentd instances
         deliver fluentd          copy scribe.*
     in_scribe                     out_scribe                              category: blog
                                     host archive.server.local            message: RAW LOG
     add_prefix scribe                remove_prefix scribe
                                     add_newline true
     remove_newline true

                                    out_flowcounter (see later..)
             time: received_at
              tag: scribe.blog
                                    roundrobin (see next)
             message: RAW LOG


                                 out_forward (see later with out_flowcounter..)

12   2   4
deliver node: roundrobin strategy to workers

         roundrobin
               x56 substore configurations (7workers x 8instances)
             out_forward
             server: worker01 port 24211
             secondary
               server: worker02 port 24211

             out_forward                                 time: received_at
             server: worker01 port 24212
             secondary                                    tag: scribe.blog
               server: worker03 port 24212               message: RAW LOG

             out_forward
             server: worker01 port 24213
             secondary
               server: worker04 port 24213

             out_forward
             server: worker01 port 24214
             secondary
               server: worker05 port 24214

12   2   4
From deliver To worker

               deliver server       worker server X
     deliver fluentd                 worker fluentd Xn1
copy scribe.*                     in_forward
             roundrobin                        time: received_at
                                                tag: scribe.blog
                    out_forward                message: RAW LOG


                                  in_forward
     time: received_at
      tag: scribe.blog
     message: RAW LOG
                                    worker fluentd Yn2
                                    worker server Y
12   2   4
worker node internal routing

         worker server x8 worker instances, x1 serializer instance
         worker fluentd                               serializer fluentd
 in_forward out_exec_filter scribe.*                  in_forward
                      command: convert.sh
                      in_keys: tag,message
                      remove_prefix scribe             out_hoop converted.blog
                      out_keys: .......               hoop_server servername.local
                      add_prefix: converted            username
                      time_key: timefield              path /on_hdfs/%Y%m%d/blog-%H.log
                      time_format: %Y%m%d%H%M%S
 time:received_at                                     out_hoop converted.news
  tag: scribe.blog   out_forward converted.*          path /on_hdfs/%Y%m%d/news-%H.log
message: RAW LOG


                                time:written_time                      TAB separated
                               tag: converted.blog                       text data
                                [many data fields]
12   2   4
out_exec_filter (review.)


                      1. fork and exec 'command' program

             2. write data to child process stdin as TAB separated
             fields specified by 'out_keys' (for tag, remove_prefix available)

                3. read data from child process stdout as TAB
              separated fields named by 'in_keys' (for tag, add_prefix
                                         available)

              4. set message's timestamp by 'time_key' value in
               parsed data as format specified by 'time_format'


12   2   4
out_exec_filter behavior details                           time: 2012/02/04 17:50:35
                                                                       tag: converted.blog
                                                                         path:... agent:...
         worker fluentd                                                referer:... flag1:TRUE

         out_exec_filter scribe.*
             command: convert.sh    in_keys: tag,message   remove_prefix: scribe
                                   out_keys: .......           add_prefix: converted
                                   time_key: timefield        time_format: %Y%m%d%H%M%S
 time: received_at
  tag: scribe.blog
 message: RAW LOG



                  blog RAWLOG                     blog 20120204175035 field1 field2.....




                     stdin                            stdout
         Forked Process (convert.sh -> perl convert.pl)

12   2   4
From serializer To HDFS (Hoop)

                                   worker server                 Hadoop NameNode
                  serializer fluentd                             Hoop Server
                                                         HTTP
 in_forward out_hoop converted.blog
                      hoop_server servername.local
                      username
                      path /on_hdfs/%Y%m%d/blog-%H.log
 time:written_time
tag: converted.blog
 [many data fields]    out_hoop converted.news
                      path /on_hdfs/%Y%m%d/news-%H.log                 TAB separated
                                                                         text data




                                                                HDFS
12   2   4
Overview       deliver node cluster

                            deliver                    archiver           backup
              servers
                                deliver

         servers


     servers
                                   worker      worker      worker    worker
                             worker      worker      worker    worker      worker
         servers


             servers                      serializer                      serializer

                          worker node cluster
                                                             HDFS
                                                          (Hoop Server)

12   2   4
crouton by kbysmnr




12   2   4
Traffics: Bytes/sec (on deliver 2/3-4)

         • bytes




12   2   4
Traffics: Messages/sec (on deliver 2/3-4)

         • counts




12   2   4
Traffic/CPU/Load/Memory: deliver nodes (2/3-4)




12   2   4
Traffics: workers network traffics total

         • total network traffics




12   2   4
Traffic/CPU/Load/Memory: a worker (2/3-4)




12   2   4
Fluentd stream processing

                     Finally, works fine, now
             Log conversion latency dramatically
                          reduced


             Many useful plugins for monitoring are waiting
                                shipped

             Hundreds of cool features to implement are also
                             waiting for us!

12   2   4
Thank you!




     crouton by kbysmnr



12   2   4
crouton by kbysmnr


             Appendix

12   2   4
input traffics: by fluent-plugin-flowcounter

         deliver server (primary) x8 fluentd instances
         deliver fluentd          copy scribe.*
     in_scribe                     out_scribe                              category: blog
                                     host archive.server.local            message: RAW LOG
     add_prefix scribe                remove_prefix scribe
                                     add_newline true
     remove_newline true

                                    out_flowcounter (see later..)
             time: received_at
              tag: scribe.blog
                                    roundrobin (see next)
             message: RAW LOG


                                 out_forward (see later with out_flowcounter..)

12   2   4
bytes/messages counting on fluentd

             1. 'out_flowcounter' counts input message and its
                   size (specified fields) and its rate (/sec)

             2. Counting results emitted per minute/hour/day

             3. Worker fluentd sends results to 'Watcher' node
                             over out_forward

             4. Watcher receives counting results, and pass to
                          'out_growthforecast'.

             'GrowthForecast' is graph drawing tool with REST
                  API for data registration, by kazeburo

12   2   4
Why not out_forward roundrobin in deliver?

             out_forward roundrobin is per buffer
                          flushing !
              (per buffer size, or flush_interval)


                 For high throughput stream,
                      this unit is too large.
               We needs roundrobin per 'emit'.
12   2   4
deliver node: roundrobin strategy to workers

         roundrobin
               x56 substore configurations (7workers x 8instances)
             out_forward
             server: worker01 port 24211
             secondary
               server: worker02 port 24211

             out_forward                                 time: received_at
             server: worker01 port 24212
             secondary                                    tag: scribe.blog
               server: worker03 port 24212               message: RAW LOG

             out_forward
             server: worker01 port 24213
             secondary
               server: worker04 port 24213

             out_forward
             server: worker01 port 24214
             secondary
               server: worker05 port 24214

12   2   4
Why not out_forward roundrobin in deliver?

             out_forward roundrobin is per buffer
                          flushing !
              (per buffer size, or flush_interval)


                 For high throughput stream,
                      this unit is too large.
               We needs roundrobin per 'emit'.
12   2   4
From worker To serializer: details

     worker server x8 worker instances, x1 serializer instance
         worker fluentd                                         serializer fluentd
          out_forward converted.*                            in_forward
             server: localhost
             secondary: worker1, worker2, worker3, worker4
                         worker5, worker6, worker7




                         normally send to localhost
                in trouble, balance all traffic to all
                     other worker's serializers
12   2   4
Software list:
                            scribed: github.com/facebook/scribe/

                         scribeline: github.com/tagomoris/scribe_line

                  fluent-plugin-scribe: github.com/fluent/fluent-plugin-scribe



             Hoop: http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

                  fluent-plugin-hoop: github.com/fluent/fluent-plugin-hoop



                    GrowthForecast: github.com/kazeburo/growthforecast

 fluent-plugin-growthforecast: github.com/tagomoris/fluent-plugin-growthforecast

         fluent-plugin-flowcounter: github.com/tagomoris/fluent-plugin-flowcounter


12   2   4

Contenu connexe

Tendances

From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalkN Masahiro
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopAll Things Open
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open sourceThomas Alrin
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 

Tendances (20)

From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open source
 
Redis 101
Redis 101Redis 101
Redis 101
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 

En vedette

Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014Naotoshi Seo
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Treasure Data, Inc.
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsSATOSHI TAGOMORI
 
Log management with Graylog2 - FrOSCon 2012
Log management with Graylog2 - FrOSCon 2012Log management with Graylog2 - FrOSCon 2012
Log management with Graylog2 - FrOSCon 2012lennartkoopmann
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSSmartNews, Inc.
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd eventKiyoto Tamura
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 

En vedette (16)

Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Log management with Graylog2 - FrOSCon 2012
Log management with Graylog2 - FrOSCon 2012Log management with Graylog2 - FrOSCon 2012
Log management with Graylog2 - FrOSCon 2012
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd event
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Fluentd at SlideShare
Fluentd at SlideShareFluentd at SlideShare
Fluentd at SlideShare
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 

Similaire à Distributed Stream Processing on Fluentd / #fluentd

Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11fHadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11fSATOSHI TAGOMORI
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
2011 05-12 nosql-progressive.net
2011 05-12 nosql-progressive.net2011 05-12 nosql-progressive.net
2011 05-12 nosql-progressive.netMårten Gustafson
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 
Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012Lai Yoong Seng
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Cloudera, Inc.
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Ceph Community
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackMirantis
 

Similaire à Distributed Stream Processing on Fluentd / #fluentd (20)

Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11fHadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
2011 05-12 nosql-progressive.net
2011 05-12 nosql-progressive.net2011 05-12 nosql-progressive.net
2011 05-12 nosql-progressive.net
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 

Plus de SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd SeasonSATOSHI TAGOMORI
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraSATOSHI TAGOMORI
 

Plus de SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 

Dernier

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Dernier (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Distributed Stream Processing on Fluentd / #fluentd

  • 1. 12 2 4
  • 2. 12 2 4
  • 3. Working at NHN Japan we are hiring! 12 2 4
  • 4. What we are doing about logs with fluentd data mining reporting page views, unique users, traffic amount per page, ... 12 2 4
  • 5. What we are doing about logs with fluentd super large scale 'sed | grep | wc' like processes 12 2 4
  • 6. What fluentd? (not Storm, Kafka or Flume?) Ruby, Ruby, Ruby! (NOT Java!) we are working in lightweight language culture easy to try, easy to patch Plugin model architecture Builtin TimeSlicedOutput mechanism 12 2 4
  • 7. What I talk today What we are trying with fluentd How we did, and how we are doing now What is distributed stream process topologies like? What is important about stream processing Implementation details (appendix) 12 2 4
  • 8. Architecture in last week's presentation archive deliver server(scribed) archive server (scribed) deliver server RAID server (scribed) (scribed) Large volume RAID server send data both archive servers and Fluentd workers (as stream) import past logs and convert on demand (as batch) Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Shib Cluster Hadoop Cluster Hadoop Cluster Hadoop Cluster Fluentd Cluster Hadoop Cluster Hadoop Hadoop Hive Cluster Cluster Cluster Cluster Cluster Cluster Web Client aggregation queries convert logs as structured data on demand and write HDFS (as stream) 12 2 4
  • 9. Now archive deliver server(scribed) archive server (scribed) deliver server RAID server (scribed) (Fluentd) Large volume RAID server Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Shib Cluster Hadoop Cluster Hadoop Cluster Hadoop Cluster Fluentd Cluster Hadoop Cluster Hadoop Hadoop Hive Cluster Cluster Cluster Cluster Cluster Cluster Web Client Fluentd Watcher 12 2 4
  • 10. Fluentd in production service 10 days 12 2 4
  • 11. Scale of Fluentd processes from 127 Web Servers 146 log streams 12 2 4
  • 12. Scale of Fluentd processes 70,000 messages/sec 120 Mbps (at peak time) 12 2 4
  • 13. Scale of Fluentd processes 650 GB/day (non-blog: 100GB) 12 2 4
  • 14. Scale of Fluentd processes 89 fluentd instances on 12 nodes (4Core HT) 12 2 4
  • 15. We can't go back. crouton by kbysmnr 12 2 4
  • 16. What we are trying with fluentd log conversion from: raw log (apache combined like format) to: structured and query-friendly log (TAB separated, masked some fields, many flags added) 12 2 4
  • 17. What we are trying with fluentd log conversion 99.999.999.99 - - [03/Feb/2012:10:59:48 +0900] "GET /article/detail/6246245/ HTTP/1.1" 200 17509 "http://news.livedoor.com/topics/detail/6246245/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.1; .NET4.0C)" "news.livedoor.com" "xxxxxxx.xx.xxxxxxx.xxx" "-" 163266 152930 news.livedoor.com /topics/detail/6242972/ GET 302 210 226 - 99.999.999.99 TQmljv9QtXkpNtCSuWVGGg Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A406 Safari/7534.48.3 TRUE TRUE FALSE FALSE FALSE FALSE FALSE hhmmdd vhost path method status bytes duration referer rhost userlabel agent FLAG [FLAGS] FLAGS: status_redirection status_errors rhost_internal suffix_miscfile suffix_imagefile agent_bot FLAG: logical OR of FLAGS userlabel: hash of (tracking cookie / terminal id (mobile phone) / rhost+agent) 12 2 4
  • 18. What we are trying with fluentd TimeSlicedOutput of fluentd Traditional 'log rotation' is important, but troublesome We want: 2/3 23:59:59 log in access.0203_23.log 2/4 00:00:00 log in access.0204_00.log 12 2 4
  • 19. How we did, and how we are doing now collect archive convert aggregate show 12 2 4
  • 20. How we did in past (2011) collect (scribed) stream stream store to hdfs archive (scribed) HIGH LATENCY hourly/daily time to flush + hourly invocation + convert (Hadoop Streaming) running time 20-25mins on demand aggregate (Hive) on demand show 12 2 4
  • 21. How we are doing now collect (Fluentd) stream stream archive (scribed) stream convert stream VELY LOW LATENCY convert (Fluentd) 2-3 minutes (only time to wait flush) store to hdfs (over Cloudera's Hoop) on demand aggregate (Hive) on demand show 12 2 4
  • 22. crouton by kbysmnr break. 12 2 4
  • 23. What is important about stream processing reasonable efficiency (compared with batch throughput) ease to re-run same conversion as batch None SPOF ease to add/remove nodes 12 2 4
  • 24. Stream processing and batch How to re-run conversion as batch when we got troubles? We want to use 'just one' converter program for both stream processes and batch processes! 12 2 4
  • 25. out_exec_filter (fluentd built-in plugin) 1. fork and exec 'command' program 2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available) 3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix available) 4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format' 12 2 4
  • 26. 'out_exec_filter' and 'Hadoop Streaming' read from stdin / write to stdout TAB separated values as input/output WOW!!!!!!! difference: 'tag' may be needed with out_exec_filter simple solution: if not exists, ignore. 12 2 4
  • 27. What is important about stream processing reasonable efficiency (compared with batch throughput) ease to re-run same conversion as batch None SPOF ease to add/remove nodes 12 2 4
  • 28. What is distributed stream process toplogies like? deliver archiver backup servers deliver servers servers worker worker worker worker worker worker worker worker worker servers servers serializer serializer Redundancy and load balancing HDFS MUST be guaranteed anywhere. (Hoop Server) 12 2 4
  • 29. Deliver nodes deliver archiver backup servers deliver servers Accept connections from web servers, servers Copy messages and send to: worker worker worker worker worker worker worker worker worker servers 1. archiver (and its backup) 2. convert workers (w/ load balancing) servers serializer and ... 3. serializer useful for casual worker append/remove HDFS (Hoop Server) 12 2 4
  • 30. Worker nodes deliver archiver backup servers Under load balancing, deliver workers as many as you want servers servers worker worker worker worker worker worker worker worker worker servers servers serializer serializer HDFS (Hoop Server) 12 2 4
  • 31. Serializer nodes deliver archiver backup Receive converted data stream from workers, servers deliver aggregate by services, and : servers 1. write to storage(hfds/hoop) servers 2. and... worker worker worker worker useful to reduce overhead of storage from many worker worker worker worker worker servers concurrent write operations servers serializer serializer HDFS (Hoop Server) 12 2 4
  • 32. Watcher nodes deliver archiver backup servers deliver Watching data for servers real-time workload repotings and trouble notifications servers worker worker worker worker worker 1. for raw data from delivers worker worker worker worker servers 2. for structured data from serializers servers serializer serializer watcher HDFS watcher (Hoop Server) 12 2 4
  • 33. crouton by kbysmnr break. 12 2 4
  • 34. Implementation details log agents on servers (scribeline) deliver (copy, in_scribe, out_scribe, out_forward) worker (in/out_forward, out_exec_filter) serializer/hooper (in/out_forward, out_hoop) watcher (in_forward, out_flowcounter, out_growthforecast) 12 2 4
  • 35. log agent: scribeline log delivery agent tool, python 2.4, scribe/thrift easy to setup and start/stop works with any httpd configuration updates works with logrotate-ed log files automatic delivery target failover/takeback (NEW) Cluster support (random select from server list) https://github.com/tagomoris/scribe_line 12 2 4
  • 36. From scribeline To deliver deliver server (primary) category: blog message: RAW LOG fluentd (Apache combined + α) in_scribe scribeline scribe servers in_scribe fluentd deliver server (secondary) 12 2 4
  • 37. deliver 01 (primary) From scribeline To deliver deliver 02 (secondary) xNN servers x8 fluentd per node deliver 03 (primary for high throughput nodes) 12 2 4
  • 38. From scribeline To deliver deliver server (primary) category: blog message: RAW LOG fluentd (Apache combined + α) in_scribe scribeline servers in_scribe fluentd deliver server (secondary) 12 2 4
  • 39. deliver node internal routing deliver server (primary) x8 fluentd instances deliver fluentd copy scribe.* in_scribe out_scribe category: blog host archive.server.local message: RAW LOG add_prefix scribe remove_prefix scribe add_newline true remove_newline true out_flowcounter (see later..) time: received_at tag: scribe.blog roundrobin (see next) message: RAW LOG out_forward (see later with out_flowcounter..) 12 2 4
  • 40. deliver node: roundrobin strategy to workers roundrobin x56 substore configurations (7workers x 8instances) out_forward server: worker01 port 24211 secondary server: worker02 port 24211 out_forward time: received_at server: worker01 port 24212 secondary tag: scribe.blog server: worker03 port 24212 message: RAW LOG out_forward server: worker01 port 24213 secondary server: worker04 port 24213 out_forward server: worker01 port 24214 secondary server: worker05 port 24214 12 2 4
  • 41. From deliver To worker deliver server worker server X deliver fluentd worker fluentd Xn1 copy scribe.* in_forward roundrobin time: received_at tag: scribe.blog out_forward message: RAW LOG in_forward time: received_at tag: scribe.blog message: RAW LOG worker fluentd Yn2 worker server Y 12 2 4
  • 42. worker node internal routing worker server x8 worker instances, x1 serializer instance worker fluentd serializer fluentd in_forward out_exec_filter scribe.* in_forward command: convert.sh in_keys: tag,message remove_prefix scribe out_hoop converted.blog out_keys: ....... hoop_server servername.local add_prefix: converted username time_key: timefield path /on_hdfs/%Y%m%d/blog-%H.log time_format: %Y%m%d%H%M%S time:received_at out_hoop converted.news tag: scribe.blog out_forward converted.* path /on_hdfs/%Y%m%d/news-%H.log message: RAW LOG time:written_time TAB separated tag: converted.blog text data [many data fields] 12 2 4
  • 43. out_exec_filter (review.) 1. fork and exec 'command' program 2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available) 3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix available) 4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format' 12 2 4
  • 44. out_exec_filter behavior details time: 2012/02/04 17:50:35 tag: converted.blog path:... agent:... worker fluentd referer:... flag1:TRUE out_exec_filter scribe.* command: convert.sh in_keys: tag,message remove_prefix: scribe out_keys: ....... add_prefix: converted time_key: timefield time_format: %Y%m%d%H%M%S time: received_at tag: scribe.blog message: RAW LOG blog RAWLOG blog 20120204175035 field1 field2..... stdin stdout Forked Process (convert.sh -> perl convert.pl) 12 2 4
  • 45. From serializer To HDFS (Hoop) worker server Hadoop NameNode serializer fluentd Hoop Server HTTP in_forward out_hoop converted.blog hoop_server servername.local username path /on_hdfs/%Y%m%d/blog-%H.log time:written_time tag: converted.blog [many data fields] out_hoop converted.news path /on_hdfs/%Y%m%d/news-%H.log TAB separated text data HDFS 12 2 4
  • 46. Overview deliver node cluster deliver archiver backup servers deliver servers servers worker worker worker worker worker worker worker worker worker servers servers serializer serializer worker node cluster HDFS (Hoop Server) 12 2 4
  • 48. Traffics: Bytes/sec (on deliver 2/3-4) • bytes 12 2 4
  • 49. Traffics: Messages/sec (on deliver 2/3-4) • counts 12 2 4
  • 51. Traffics: workers network traffics total • total network traffics 12 2 4
  • 53. Fluentd stream processing Finally, works fine, now Log conversion latency dramatically reduced Many useful plugins for monitoring are waiting shipped Hundreds of cool features to implement are also waiting for us! 12 2 4
  • 54. Thank you! crouton by kbysmnr 12 2 4
  • 55. crouton by kbysmnr Appendix 12 2 4
  • 56. input traffics: by fluent-plugin-flowcounter deliver server (primary) x8 fluentd instances deliver fluentd copy scribe.* in_scribe out_scribe category: blog host archive.server.local message: RAW LOG add_prefix scribe remove_prefix scribe add_newline true remove_newline true out_flowcounter (see later..) time: received_at tag: scribe.blog roundrobin (see next) message: RAW LOG out_forward (see later with out_flowcounter..) 12 2 4
  • 57. bytes/messages counting on fluentd 1. 'out_flowcounter' counts input message and its size (specified fields) and its rate (/sec) 2. Counting results emitted per minute/hour/day 3. Worker fluentd sends results to 'Watcher' node over out_forward 4. Watcher receives counting results, and pass to 'out_growthforecast'. 'GrowthForecast' is graph drawing tool with REST API for data registration, by kazeburo 12 2 4
  • 58. Why not out_forward roundrobin in deliver? out_forward roundrobin is per buffer flushing ! (per buffer size, or flush_interval) For high throughput stream, this unit is too large. We needs roundrobin per 'emit'. 12 2 4
  • 59. deliver node: roundrobin strategy to workers roundrobin x56 substore configurations (7workers x 8instances) out_forward server: worker01 port 24211 secondary server: worker02 port 24211 out_forward time: received_at server: worker01 port 24212 secondary tag: scribe.blog server: worker03 port 24212 message: RAW LOG out_forward server: worker01 port 24213 secondary server: worker04 port 24213 out_forward server: worker01 port 24214 secondary server: worker05 port 24214 12 2 4
  • 60. Why not out_forward roundrobin in deliver? out_forward roundrobin is per buffer flushing ! (per buffer size, or flush_interval) For high throughput stream, this unit is too large. We needs roundrobin per 'emit'. 12 2 4
  • 61. From worker To serializer: details worker server x8 worker instances, x1 serializer instance worker fluentd serializer fluentd out_forward converted.* in_forward server: localhost secondary: worker1, worker2, worker3, worker4 worker5, worker6, worker7 normally send to localhost in trouble, balance all traffic to all other worker's serializers 12 2 4
  • 62. Software list: scribed: github.com/facebook/scribe/ scribeline: github.com/tagomoris/scribe_line fluent-plugin-scribe: github.com/fluent/fluent-plugin-scribe Hoop: http://cloudera.github.com/hoop/docs/latest/ServerSetup.html fluent-plugin-hoop: github.com/fluent/fluent-plugin-hoop GrowthForecast: github.com/kazeburo/growthforecast fluent-plugin-growthforecast: github.com/tagomoris/fluent-plugin-growthforecast fluent-plugin-flowcounter: github.com/tagomoris/fluent-plugin-flowcounter 12 2 4