SlideShare une entreprise Scribd logo
1  sur  145
Télécharger pour lire hors ligne
Fabric, Cuisine &
   Watchdog

  Sébastien Pierre, ffunction inc.
@Montréal Python, February 2011

          www.ffctn.com


                                     ffunction
                                     inc.
How to use Python for
Server Administration
                             Thanks to
                              Fabric
                             Cuisine*
                           & Watchdog*
                             *custom tools




                                  ffunction
                                  inc.
The way we use
    servers
 has changed




                 ffunction
                 inc.
The era of dedicated servers

Hosted in your server room or in colocation



         WEB                        DATABASE    EMAIL
        SERVER                       SERVER    SERVER




                                                        ffunction
                                                        inc.
The era of dedicated servers

Hosted in your server room or in colocation



         WEB                        DATABASE    EMAIL
        SERVER                       SERVER    SERVER




                 Sysadmins typically
                  Sysadmins typically
                 SSH and configure
                  SSH and configure
                   the servers live
                    the servers live



                                                        ffunction
                                                        inc.
The era of dedicated servers

Hosted in your server room or in colocation



         WEB                        DATABASE            EMAIL
        SERVER                       SERVER            SERVER




                                The servers are
                                 The servers are
                            conservatively managed,
                             conservatively managed,
                               updates are risky
                                updates are risky



                                                                ffunction
                                                                inc.
The era of slices/VPS

Linode.com                                 Amazon Ec2




       SLICESLICE SLICE 1
            1     1   SLICE 1
                            SLICESLICE 6
                                 1            SLICE SLICE 11
                                                    10    SLICE 9




  We now have multiple
   We now have multiple
  small virtual servers
   small virtual servers
      (slices/VPS)
       (slices/VPS)



                                                           ffunction
                                                           inc.
The era of slices/VPS

Linode.com                                           Amazon Ec2




      SLICESLICE SLICE 1
           1     1   SLICE 1
                           SLICESLICE 6
                                1                       SLICE SLICE 11
                                                              10    SLICE 9




                       Often located in different
                        Often located in different
                             data-centers
                              data-centers




                                                                     ffunction
                                                                     inc.
The era of slices/VPS

Linode.com                                Amazon Ec2




      SLICESLICE SLICE 1
           1     1   SLICE 1
                           SLICESLICE 6
                                1            SLICE SLICE 11
                                                   10    SLICE 9




                                            ...and sometimes with
                                              ...and sometimes with
                                               different providers
                                                 different providers




                                                              ffunction
                                                              inc.
The era of slices/VPS

Linode.com                                Amazon Ec2




      SLICESLICE SLICE 1
           1     1   SLICE 1
                           SLICESLICE 6
                                1            SLICE SLICE 11
                                                   10    SLICE 9




IWeb.com



                                             We even sometimes
      DEDICATED           DEDICATED           We even sometimes
                                             still have physical,
       SERVER 1            SERVER 2            still have physical,
                                              dedicated servers
                                               dedicated servers



                                                              ffunction
                                                              inc.
The challenge




ORDER         SETUP        DEPLOY
SERVER       SERVER      APPLICATION




                                   ffunction
                                   inc.
The challenge




ORDER                   SETUP                  DEPLOY
SERVER                 SERVER                APPLICATION



         MAKE THIS PROCESS AS FAST (AND SIMPLE)
                      AS POSSIBLE




                                                       ffunction
                                                       inc.
The challenge
                       Create users, groups
                         Create users, groups
                       Customize config files
                        Customize config files
                       Install base packages
                        Install base packages




ORDER                   SETUP                      DEPLOY
SERVER                 SERVER                    APPLICATION



         MAKE THIS PROCESS AS FAST (AND SIMPLE)
                      AS POSSIBLE




                                                           ffunction
                                                           inc.
The challenge
                                           Install app-specific
                                             Install app-specific
                                                 packages
                                                  packages
                                            deploy application
                                              deploy application
                                               start services
                                                start services



ORDER                   SETUP                  DEPLOY
SERVER                 SERVER                APPLICATION



         MAKE THIS PROCESS AS FAST (AND SIMPLE)
                      AS POSSIBLE




                                                            ffunction
                                                            inc.
The challenge




                ffunction
                inc.
The challenge




         Quickly integrate your
          Quickly integrate your
           new server in the
            new server in the
         existing architecture
          existing architecture




                                   ffunction
                                   inc.
The challenge   ...and make sure
                  ...and make sure
                    it's running!
                      it's running!




                               ffunction
                               inc.
Today's menu


                 Interact with your remote machines
 FABRIC          as if they were local



                 Takes care of users, group, packages
 CUISINE
                 and configuration of your new machine



                 Ensures that your servers and services
WATCHDOG
                 are up and running




                                                 ffunction
                                                 inc.
Today's menu


                        Interact with your remote machines
 FABRIC                 as if they were local



                        Takes care of users, group, packages
 CUISINE     Made by
              Made by   and configuration of your new machine



                        Ensures that your servers and services
WATCHDOG
                        are up and running




                                                        ffunction
                                                        inc.
Part 1
         Fabric - http://fabfile.org




application deployment & systems administration tasks

                                                   ffunction
                                                   inc.
Fabric is a Python library
       and command-line tool
for streamlining the use of SSH
     for application deployment
 or systems administration tasks.




                                    ffunction
                                    inc.
Wait... what does
                                Wait... what does
                                 that mean ?
                                  that mean ?
      Fabric is a Python library
       and command-line tool
for streamlining the use of SSH
     for application deployment
 or systems administration tasks.




                                       ffunction
                                       inc.
Streamlining SSH

By hand:

version = os.popen(“ssh myserver 'cat /proc/version'”).read()



Using Fabric:

version = run(“cat /proc/version”)




                                                      ffunction
                                                      inc.
Streamlining SSH

By hand:

version = os.popen(“ssh myserver 'cat /proc/version').read()



Using Fabric:

from fabric.api import *
env.hosts = [“myserver”]
version = run(“cat /proc/version”)




                                                      ffunction
                                                      inc.
Streamlining SSH

By hand:
                           You can specify
                             You can specify
                        multiple hosts and run
version = os.popen(“ssh myserver 'cat run
                         multiple hosts and /proc/version').read()
                         the same commands
                          the same commands
                             across them
                              across them
Using Fabric:

from fabric.api import *
env.hosts = [“myserver”]
version = run(“cat /proc/version”)




                                                            ffunction
                                                            inc.
Streamlining SSH

By hand:

version = os.popen(“ssh myserver 'cat /proc/version').read()
                                  Connections will be
                                     Connections will be
                                     lazily created and
                                      lazily created and
                                           pooled
                                             pooled
Using Fabric:

from fabric.api import *
env.hosts = [“myserver”]
version = run(“cat /proc/version”)




                                                           ffunction
                                                           inc.
Streamlining SSH

By hand:

version = os.popen(“ssh myserver 'cat /proc/version').read()



Using Fabric:

from fabric.api import *
env.hosts = [“myserver”]
version = run(“cat /proc/version”)




             Failures ($STATUS) will
              Failures ($STATUS) will
           be handled just like in Make
            be handled just like in Make


                                                      ffunction
                                                      inc.
Example: Installing packages


sudo(“aptitude install nginx”)




if run("dpkg -s %s | grep 'Status:' ; true" %
package).find("installed") == -1:
   sudo("aptitude install '%s'" % (package)




                                                ffunction
                                                inc.
Example: Installing packages


sudo(“aptitude install nginx”)
        It's easy to take action
         It's easy to take action
       depending on the result
        depending on the result




if run("dpkg -s %s | grep 'Status:' ; true" %
package).find("installed") == -1:
   sudo("aptitude install '%s'" % (package)




                                                ffunction
                                                inc.
Example: Installing packages

                                         Note that we add true
                                          Note that we add true
sudo(“aptitude install nginx”)          so that the run() always
                                         so that the run() always
                                               succeeds*
                                                 succeeds*
                                         * there are other ways...
                                           * there are other ways...




if run("dpkg -s %s | grep 'Status:' ; true" %
package).find("installed") == -1:
   sudo("aptitude install '%s'" % (package)




                                                                   ffunction
                                                                   inc.
Example: retrieving system status



disk_usage = run(“df -kP”)
mem_usage = run(“cat /proc/meminfo”)
cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info




                                         ffunction
                                         inc.
Example: retrieving system status



disk_usage = run(“df -kP”)
mem_usage = run(“cat /proc/meminfo”)
cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info




 Very useful for getting
  Very useful for getting
 live information from
   live information from
 many different servers
  many different servers




                                           ffunction
                                           inc.
Fabfile.py

from fabric.api import *
from mysetup    import *

env.host = [“server1.myapp.com”]

def setup():
    install_packages(“...”)
    update_configuration()
    create_users()
    start_daemons()




$ fab setup




                                   ffunction
                                   inc.
Fabfile.py

from fabric.api import *
from mysetup    import *

env.host = [“server1.myapp.com”]

def setup():
    install_packages(“...”)
    update_configuration()
    create_users()
    start_daemons()
              Just like Make, you
               Just like Make, you
              write rules that do
               write rules that do
                  something
                    something

$ fab setup




                                     ffunction
                                     inc.
Fabfile.py

from fabric.api import *
from mysetup    import *

env.host = [“server1.myapp.com”]

def setup():
    install_packages(“...”)
    update_configuration()      ...and you can specify
    create_users()                ...and you can specify
                              on which servers the rules
    start_daemons()            on which servers the rules
                                        will run
                                          will run




$ fab setup




                                                            ffunction
                                                            inc.
Multiple hosts


env.hosts = [
   “db1.myapp.com”,
   “db2.myapp.com”,
   “db3.myapp.com”
]



@hosts(“db1.myapp”)
def backup_db():
   run(...)




                                       ffunction
                                       inc.
Roles


env.roledefs = {
    'web': ['www1', 'www2', 'www3'],
    'dns': ['ns1', 'ns2']
}




$ fab -R web setup




                                       ffunction
                                       inc.
Roles


env.roledefs = {
    'web': ['www1', 'www2', 'www3'],
    'dns': ['ns1', 'ns2']
}




$ fab -R web setup




           Will run the setup rule
            Will run the setup rule
           only on hosts members
            only on hosts members
               of the web role.
                of the web role.

                                       ffunction
                                       inc.
What's good about Fabric?


Low-level
Basically an ssh() command that returns the result
Simple primitives
run(), sudo(), get(), put(), local(), prompt(), reboot()
No magic
No DSL, no abstraction, just a remote command API




                                                      ffunction
                                                      inc.
What could be improved ?


Ease common admin tasks
User, group creation. Files, directory operations.
Abstract primitives
Like install package, so that it works with different OS
Templates
To make creating/updating configuration files easy




                                                  ffunction
                                                  inc.
Cuisine:
Chef-like functionality for Fabric




                                 ffunction
                                 inc.
Part 2
Cuisine




          ffunction
          inc.
What is Opscode's Chef?
           http://wiki.opscode.com/display/chef/Home


Recipes
Scripts/packages to install and configure services and
applications
API
A DSL-like Ruby API to interact with the OS (create
users, groups, install packages, etc)
Architecture
Client-server or “solo” mode to push and deploy your
new configurations

                                                       ffunction
                                                       inc.
What I liked about Chef


Flexible
You can use the API or shell commands
Structured
Helped me have a clear decomposition of the services
installed per machine
Community
Lots of recipes already available from
http://cookbooks.opscode.com/



                                                ffunction
                                                inc.
What I didn't like


Too many files and directories
Code is spread out, hard to get the big picture
Abstraction overload
API not very well documented, frequent fall backs to
plain shell scripts within the recipe
No “smart” recipe
Recipes are applied all the time, even when it's not
necessary


                                                  ffunction
                                                  inc.
The question that kept coming...


                                        sudo aptitude install
                                        apache2 python django-
                                        python




Django recipe: 5 files, 2 directories   What it does, in essence


                                                                   ffunction
                                                                   inc.
The question that kept coming...

                        Is this really necessary
                          Is this really necessary
                         for what I want to do ?     sudo aptitude install
                           for what I want to do ?   apache2 python django-
                                                     python




Django recipe: 5 files, 2 directories                What it does, in essence


                                                                                ffunction
                                                                                inc.
What I loved about Fabric


Bare metal
ssh() function, simple and elegant set of primitives
No magic
No abstraction, no model, no compilation
Two-way communication
Easy to change the rule's behaviour according to the
output (ex: do not install something that's already
installed)


                                                 ffunction
                                                 inc.
What I needed




    Fabric



                ffunction
                inc.
What I needed




File I/O
 File I/O




                Fabric



                            ffunction
                            inc.
What I needed




               User/Group
                User/Group
File I/O
 File I/O      Management
                Management




                 Fabric



                             ffunction
                             inc.
What I needed




               User/Group
                User/Group     Package
                                Package
File I/O
 File I/O      Management
                Management   Management
                              Management




                 Fabric



                                       ffunction
                                       inc.
What I needed

            Text processing & Templates
             Text processing & Templates




                   User/Group
                    User/Group               Package
                                              Package
File I/O
 File I/O          Management
                    Management             Management
                                            Management




                      Fabric



                                                     ffunction
                                                     inc.
How I wanted it


Simple “flat” API
[object]_[operation] where operation is something in “create”,
“read”, “update”, “write”, “remove”, “ensure”, etc...
Driven by need
Only implement a feature if I have a real need for it
No magic
Everything is implemented using sh-compatible commands
No unnecessary structure
Everything fits in one file, no imposed file layout


                                                        ffunction
                                                        inc.
Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():
   package_ensure(“python”, “apache2”, “python-django”)
   user_ensure(“admin”, uid=2000)
   upstart_ensure(“django”)




$ fab setup




                                                     ffunction
                                                     inc.
Cuisine:Fabric's coreimportedfabfile.py
                   Example     functions
                  Fabric's core functions
                  are already
                        are already imported



from cuisine import *

env.host = [“server1.myapp.com”]

def setup():
   package_ensure(“python”, “apache2”, “python-django”)
   user_ensure(“admin”, uid=2000)
   upstart_ensure(“django”)




$ fab setup




                                                     ffunction
                                                     inc.
Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():
   package_ensure(“python”, “apache2”, “python-django”)
   user_ensure(“admin”, uid=2000)
   upstart_ensure(“django”)




                    Cuisine's API
$ fab setup          Cuisine's API
                        calls
                         calls




                                                     ffunction
                                                     inc.
File I/O


           ffunction
           inc.
Cuisine : File I/O



●
    file_exists       does remote file exists?
●
    file_read         reads remote file
●
    file_write        write data to remote file
●
    file_append       appends data to remote file
●
    file_attribs      chmod & chown
●
    file_remove


                                                    ffunction
                                                    inc.
Cuisine : File I/O


                                           Supports owner/group
●
    file_exists       does remote file exists?
                                            Supports owner/group
                                             and mode change
                                              and mode change
●
    file_read         reads remote file
●
    file_write        write data to remote file
●
    file_append       appends data to remote file
●
    file_attribs      chmod & chown
●
    file_remove


                                                      ffunction
                                                      inc.
Cuisine : File I/O (directories)



●
    dir_exists        does remote file exists?
●
    dir_ensure        ensures that a directory exists
●
    dir_attribs       chmod & chown
●
    dir_remove




                                                   ffunction
                                                   inc.
Cuisine : File I/O +



●
    file_update(location, updater=lambda _:_)
     package_ensure("mongodb-snapshot")
     def update_configuration( text ):
         res = []
         for line in text.split("n"):
             if line.strip().startswith("dbpath="):
                 res.append("dbpath=/data/mongodb")
             elif line.strip().startswith("logpath="):
                 res.append("logpath=/data/logs/mongodb.log")
             else:
                 res.append(line)
         return "n".join(res)
     file_update("/etc/mongodb.conf", update_configuration)


                                                          ffunction
                                                          inc.
Cuisine : File I/O +


                                               This replaces the values for
                                                This replaces the values for
●
    file_update(location, updater=lambda _:_)     configuration entries
                                                    configuration entries
                                                   dbpath and logpath
                                                     dbpath and logpath
     package_ensure("mongodb-snapshot")
     def update_configuration( text ):
         res = []
         for line in text.split("n"):
             if line.strip().startswith("dbpath="):
                 res.append("dbpath=/data/mongodb")
             elif line.strip().startswith("logpath="):
                 res.append("logpath=/data/logs/mongodb.log")
             else:
                 res.append(line)
         return "n".join(res)
     file_update("/etc/mongodb.conf", update_configuration)


                                                               ffunction
                                                               inc.
Cuisine : File I/O +



●
     file_update(location, updater=lambda _:_)
        package_ensure("mongodb-snapshot")
        def update_configuration( text ):
               res = []
    The remote file will only be
     The remote file line in text.split("n"):
               for will only be
      changed if the content
       changed if the content
                      if line.strip().startswith("dbpath="):
           is different
             is different res.append("dbpath=/data/mongodb")
                      elif line.strip().startswith("logpath="):
                            res.append("logpath=/data/logs/mongodb.log")
                      else:
                            res.append(line)
               return "n".join(res)
        file_update("/etc/mongodb.conf", update_configuration)


                                                                     ffunction
                                                                     inc.
User Management


                  ffunction
                  inc.
Cuisine: User Management



●
    user_exists      does the user exists?
●
    user_create      create the user
●
    user_ensure      create the user if it doesn't exist




                                                ffunction
                                                inc.
Cuisine: Group Management



●
    group_exists       does the group exists?
●
    group_create       create the group
●
    group_ensure       create the group if it doesn't exist
●
    group_user_exists does the user belong to the group?
●
    group_user_add     adds the user to the group
●
    group_user_ensure


                                                     ffunction
                                                     inc.
Package Management


                     ffunction
                     inc.
Cuisine: Package Management



●
    package_exists      is the package available ?
●
    package_installed is it installed ?
●
    package_install     install the package
●
    package_ensure      ... only if it's not installed
●
    package_upgrade upgrades the/all package(s)



                                                         ffunction
                                                         inc.
Text & Templates


                   ffunction
                   inc.
Cuisine: Text transformation



text_ensure_line(text, lines)

file_update(
   "/home/user/.profile",
   lambda _:text_ensure_line(_,
      "PYTHONPATH=/opt/lib/python:${PYTHONPATH};"
      "export PYTHONPATH"
))




                                                ffunction
                                                inc.
Cuisine: Text transformation


                                      Ensures that the PYTHONPATH
                                       Ensures that the PYTHONPATH
                                       variable is set and exported,
text_ensure_line(text, lines)           variable is set and exported,
                                         If not, these lines will be
                                           If not, these lines will be
                                                  appended.
                                                    appended.

file_update(
   "/home/user/.profile",
   lambda _:text_ensure_line(_,
      "PYTHONPATH=/opt/lib/python:${PYTHONPATH};"
      "export PYTHONPATH"
))




                                                       ffunction
                                                       inc.
Cuisine: Text transformation



text_replace_line(text, old, new, find=.., process=...)


configuration = local_read("server.conf")
for key, value in variables.items():
   configuration, replaced = text_replace_line(
      configuration,
      key + "=",
      key + "=" + repr(value),
      process=lambda text:text.split("=")[0].strip()
   )


                                                      ffunction
                                                      inc.
Cuisine: Text transformation


                                   Replaces lines that look like
                                    Replaces lines that look like
                                         VARIABLE=VALUE
text_replace_line(text, old, new, find=.., process=...)
                                          VARIABLE=VALUE
                                  with the actual values from the
                                   with the actual values from the
                                        variables dictionary.
                                         variables dictionary.


configuration = local_read("server.conf")
for key, value in variables.items():
   configuration, replaced = text_replace_line(
      configuration,
      key + "=",
      key + "=" + repr(value),
      process=lambda text:text.split("=")[0].strip()
   )


                                                                 ffunction
                                                                 inc.
Cuisine: Text transformation



text_replace_line(text, old, new, find=..,process lambda transforms
                                      The process=...)
                                       The process lambda transforms
                                             input lines before comparing
                                              input lines before comparing
                                                         them.
                                                          them.
configuration = local_read("server.conf")lines are stripped
                                     Here the
                                      Here the lines are stripped
for key, value in variables.items(): of spaces and of their value.
                                    of spaces and of their value.
   configuration, replaced = text_replace_line(
      configuration,
      key + "=",
      key + "=" + repr(value),
      process=lambda text:text.split("=")[0].strip()
   )


                                                                   ffunction
                                                                   inc.
Cuisine: Text transformation



text_strip_margin(text)


file_write(".profile", text_strip_margin(
   """
   |export PATH="$HOME/bin":$PATH
   |set -o vi
   """
))




                                            ffunction
                                            inc.
Cuisine: Text transformation

                                     Everything after the | separator
                                      Everything after the | separator
                                        will be output as content.
                                         will be output as content.
text_strip_margin(text)               It allows to easily embed text
                                        It allows to easily embed text
                                       templates within functions.
                                         templates within functions.

file_write(".profile", text_strip_margin(
   """
   |export PATH="$HOME/bin":$PATH
   |set -o vi
   """
))




                                                        ffunction
                                                        inc.
Cuisine: Text transformation



text_template(text, variables)
text_template(text_strip_margin(
   """
   |cd ${DAEMON_PATH}
   |exec ${DAEMON_EXEC_PATH}
   """
), dict(
   DAEMON_PATH="/opt/mongodb",
   DAEMON_EXEC_PATH="/opt/mongodb/mongod"
))


                                            ffunction
                                            inc.
Cuisine: Text transformation


                                       This is a simple wrapper
text_template(text, variables)          This is a simple wrapper
                                         around Python (safe)
                                          around Python (safe)
                                      string.template() function
                                       string.template() function
text_template(text_strip_margin(
   """
   |cd ${DAEMON_PATH}
   |exec ${DAEMON_EXEC_PATH}
   """
), dict(
   DAEMON_PATH="/opt/mongodb",
   DAEMON_EXEC_PATH="/opt/mongodb/mongod"
))


                                                       ffunction
                                                       inc.
Cuisine: Goodies



●
    ssh_keygen       generates DSA keys
●
    ssh_authorize    authorizes your key on the remote server
●
    mode_sudo        run() always uses sudo
●
    upstart_ensure   ensures the given daemon is running


    & more!



                                                     ffunction
                                                     inc.
Why use Cuisine ?

●
    Simple API for remote-server manipulation
    Files, users, groups, packages
●
    Shell commands for specific tasks only
    Avoid problems with your shell commands by
    only using run() for very specific tasks
●
    Cuisine tasks are not stupid
    *_ensure() commands won't do anything if it's
    not necessary

                                             ffunction
                                             inc.
Limitations

●
    Limited to sh-shells
    Operations will not work under csh
●
    Only written/tested for Ubuntu Linux
    Contributors could easily port commands




                                              ffunction
                                              inc.
Get started !




             On Github:
http://github.com/sebastien/cuisine

        1 short Python file
         Documented API



                                      ffunction
                                      inc.
Part 3
       Watchdog




Server and services monitoring

                                 ffunction
                                 inc.
The problem




              ffunction
              inc.
The problem




Low disk space
 Low disk space




                  ffunction
                  inc.
The problem



Archive files
 Archive files
Rotate logs
 Rotate logs
Purge cache
 Purge cache




                 ffunction
                 inc.
The problem   HTTP server
               HTTP server
                has high
                 has high
                latency
                 latency




                             ffunction
                             inc.
The problem   Restart HTTP
               Restart HTTP
                 server
                  server




                              ffunction
                              inc.
The problem




      System load
       System load
       is too high
         is too high




                       ffunction
                       inc.
The problem




        re-nice
         re-nice
       important
        important
       processes
        processes




                    ffunction
                    inc.
We want to be notified
when incidents happen




                         ffunction
                         inc.
We want automatic actions to be taken
         whenever possible




                                 ffunction
                                 inc.
(Some of the) existing solutions


Monit, God, Supervisord, Upstart
Focus on starting/restarting daemons and
services
Munin, Cacti
Focus on visualization of RRDTool data
Collectd
Focus on collecting and publishing data


                                           ffunction
                                           inc.
The ideal tool


Wide spectrum
Data collection, service monitoring, actions
Easy setup and deployment
No complex installation or configuration
Flexible server architecture
Can monitor local or remote processes
Customizable and extensible
From restarting deamons to monitoring whole
servers
                                               ffunction
                                               inc.
Hello, Watchdog!


     SERVICE




                   ffunction
                   inc.
Hello, Watchdog!


     SERVICE




      RULE




                   ffunction
                   inc.
Hello, Watchdog!
                   A service is a
                    A service is a
                   collection of
                    collection of
                      RULES
                       RULES
     SERVICE




      RULE




                               ffunction
                               inc.
Hello, Watchdog!


     SERVICE




                   HTTP Request
      RULE         CPU, Disk, Mem %
                   Process status
                   I/O Bandwidth




                              ffunction
                              inc.
Hello, Watchdog!


                               SERVICE



 Each rule retrieves
  Each rule retrieves
data and processes it.                       HTTP Request
 data and processes it.
 Rules can SUCCEED              RULE         CPU, Disk, Mem %
  Rules can SUCCEED
       or FAIL                               Process status
        or FAIL
                                             I/O Bandwidth




                                                        ffunction
                                                        inc.
Hello, Watchdog!


     SERVICE




                   HTTP Request
      RULE         CPU, Disk, Mem %
                   Process status
                   I/O Bandwidth




     ACTION




                              ffunction
                              inc.
Hello, Watchdog!


     SERVICE




                   HTTP Request
      RULE         CPU, Disk, Mem %
                   Process status
                   I/O Bandwidth


                   Logging
                   XMPP, Email notifications
     ACTION
                   Start/stop process
                   ….



                                ffunction
                                inc.
Hello, Watchdog!


                           SERVICE




                                         HTTP Request
                            RULE         CPU, Disk, Mem %
                                         Process status
                                         I/O Bandwidth

Actions are bound
 Actions are bound                       Logging
to rule, triggered
 to rule, triggered
on rule SUCCESS                          XMPP, Email notifications
 on rule SUCCESS           ACTION
   or FAILURE                            Start/stop process
    or FAILURE
                                         ….



                                                      ffunction
                                                      inc.
Execution Model



MONITOR




                            ffunction
                            inc.
Execution Model
          SERVICE DEFINITION


                     RULE
MONITOR
               (frequency in ms)




                                   ffunction
                                   inc.
Services are registered
 Services are registered
                           Execution Model
    in the monitor
      in the monitor
                           SERVICE DEFINITION


                                      RULE
       MONITOR
                                (frequency in ms)




                                                    ffunction
                                                    inc.
Execution Model          Rules defined in the
                                    Rules defined in the
                                   service are executed
                                    service are executed
                                        every N ms
                                          every N ms
                                        (frequency)
          SERVICE DEFINITION              (frequency)


                     RULE
MONITOR
               (frequency in ms)




                                                   ffunction
                                                   inc.
Execution Model
          SERVICE DEFINITION


                     RULE
MONITOR
               (frequency in ms)

           SUCCESS                 FAILURE


                     ACTION                  ACTION


                     ACTION




                                                      ffunction
                                                      inc.
Execution Model
          SERVICE DEFINITION


                     RULE
MONITOR
               (frequency in ms)

           SUCCESS                     FAILURE


                     ACTION                      ACTION


                     ACTION




              If the rule SUCCEEDS
                If the rule SUCCEEDS
                    actions will be
                     actions will be
              sequentially executed
               sequentially executed
                                                          ffunction
                                                          inc.
Execution Model
          SERVICE DEFINITION


                     RULE
MONITOR
               (frequency in ms)

           SUCCESS                 FAILURE


                     ACTION                  ACTION


                     ACTION

                                          If the rule FAIL
                                            If the rule FAIL
                                      failure actions will be
                                       failure actions will be
                                      sequentially executed
                                       sequentially executed




                                                      ffunction
                                                      inc.
Monitoring a remote machine

#!/usr/bin/env python
from watchdog import *
Monitor(
   Service(
       name = "google-search-latency",
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                 Print("Google search query took more than 50ms")
              ]
          )
       )
   )
).run()



                                                          ffunction
                                                          inc.
Monitoring a remote machine
                             A monitor is like the
                              A monitor is like the
                            “main” for Watchdog.
#!/usr/bin/env python        “main” for Watchdog.
                             It actively monitors
from watchdog import *         It actively monitors
Monitor(                            services.
                                     services.
   Service(
       name = "google-search-latency",
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                 Print("Google search query took more than 50ms")
              ]
          )
       )
   )
).run()



                                                          ffunction
                                                          inc.
Monitoring a remote machine

#!/usr/bin/env python
from watchdog import *
Monitor(
   Service(
       name = "google-search-latency",
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                  Print("Google search query took more than 50ms")
              ]
          )
       )
   )
).run()         Don't forget to call
                Don't forget to call
                  run() on it
                    run() on it

                                                          ffunction
                                                          inc.
Monitoring a remote machine

#!/usr/bin/env python         The service monitors
from watchdog import *         The service monitors
                                    the rules
Monitor(                             the rules
   Service(
       name = "google-search-latency",
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                 Print("Google search query took more than 50ms")
              ]
          )
       )
   )
).run()



                                                          ffunction
                                                          inc.
Monitoring a remote machine

#!/usr/bin/env python
from watchdog import *                         The HTTP rule
                                                 The HTTP rule
Monitor(                                        allows to test
                                                 allows to test
   Service(                                        an URL
       name = "google-search-latency",              an URL
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                 Print("Google search query took more than 50ms")
              ]
          )
       )
   )               And we display a
                    And we display a
).run()            message in case
                   message in case
                    of failure
                     of failure

                                                          ffunction
                                                          inc.
Monitoring a remote machine

#!/usr/bin/env python
from watchdog import *
Monitor(
   Service(
       name = "google-search-latency",
       monitor = (
          HTTP(
              GET="http://www.google.ca/search?q=watchdog",
              freq=Time.s(1),
              timeout=Time.ms(80),
              fail=[
                 Print("Google search query took more than 50ms")
              ]
          )
       )
                     If it there is a 4XX or
   )                   If it there is a 4XX or
                     it timeouts, the rule
).run()                it timeouts, the rule
                    will fail and display
                     will fail and display
                    an error message
                     an error message
                                                          ffunction
                                                          inc.
Monitoring a remote machine

$ python example-service-monitoring.py

2011-02-27T22:33:18 watchdog --- #0 (runners=1,threads=2,duration=0.57s)
2011-02-27T22:33:18 watchdog [!] Failure on HTTP(GET="www.google.ca:80/search?
q=watchdog",timeout=0.08) : Socket error: timed out
Google search query took more than 50ms
2011-02-27T22:33:19 watchdog --- #1 (runners=1,threads=2,duration=0.73s)
2011-02-27T22:33:20 watchdog --- #2 (runners=1,threads=2,duration=0.54s)
2011-02-27T22:33:21 watchdog --- #3 (runners=1,threads=2,duration=0.69s)
2011-02-27T22:33:22 watchdog --- #4 (runners=1,threads=2,duration=0.77s)
2011-02-27T22:33:23 watchdog --- #5 (runners=1,threads=2,duration=0.70s)




                                                                    ffunction
                                                                    inc.
Sending Email Notification

send_email = Email(
   "notifications@ffctn.com",
   "[Watchdog]Google Search Latency Error", "Latency was over 80ms",
   "smtp.gmail.com", "myusername", "mypassword"
)

[…]
HTTP(
    GET="http://www.google.ca/search?q=watchdog",
    freq=Time.s(1),
    timeout=Time.ms(80),
    fail=[
       send_email
    ]
)




                                                         ffunction
                                                         inc.
Sending Email Notification

send_email = Email(
   "notifications@ffctn.com",
   "[Watchdog]Google Search Latency Error", "Latency was over 80ms",
   "smtp.gmail.com", "myusername", "mypassword"
)

[…]
HTTP(                                   The Email rule will send
    GET="http://www.google.ca/search?q=watchdog", to send
                                         The Email rule will
                                               an email
    freq=Time.s(1),                             an email to
                                        notifications@ffctn.com
    timeout=Time.ms(80),                 notifications@ffctn.com
                                            when triggered
    fail=[                                    when triggered
       send_email
    ]
)




                                                                   ffunction
                                                                   inc.
Sending Email Notification

send_email = Email(
   "notifications@ffctn.com",
   "[Watchdog]Google Search Latency Error", "Latency was over 80ms",
   "smtp.gmail.com", "myusername", "mypassword"
)

[…]
HTTP(
    GET="http://www.google.ca/search?q=watchdog",
    freq=Time.s(1),
    timeout=Time.ms(80),
    fail=[
       send_email
    ]
)

                This is how we bind the
                 This is how we bind the
                action to the rule failure
                 action to the rule failure

                                                         ffunction
                                                         inc.
Sending Email+Jabber Notification

send_xmpp = XMPP(
   "notifications@jabber.org",
   "Watchdog: Google search latency over 80ms",
   "myuser@jabber.org", "myspassword"
)

[…]
HTTP(
    GET="http://www.google.ca/search?q=watchdog",
    freq=Time.s(1),
    timeout=Time.ms(80),
    fail=[
       send_email, send_xmpp
    ]
)




                                                    ffunction
                                                    inc.
Monitoring incident: when something
fails repeatedly during a given period of
                  time




                                    ffunction
                                    inc.
Monitoring incident: when something
fails repeatedly during a given period of
                  time

                       You don't want to be
                        You don't want to be
                       notified all the time,
                        notified all the time,
                        only when it really
                         only when it really
                              matters.
                               matters.




                                                 ffunction
                                                 inc.
Detecting incidents

HTTP(
   GET="http://www.google.ca/search?q=watchdog",
   freq=Time.s(1),
   timeout=Time.ms(80),
   fail=[
      Incident(
          errors = 5,
          during = Time.s(10),
          actions = [send_email,send_xmpp]
      )
   ]
)




                                                   ffunction
                                                   inc.
Detecting incidents
     An incident is a “smart”
      An incident is a “smart”
      action : it will only do
       action : it will only do
      something when the
HTTP( something when the
         condition is met
   GET="http://www.google.ca/search?q=watchdog",
          condition is met
    freq=Time.s(1),
    timeout=Time.ms(80),
    fail=[
       Incident(
           errors = 5,
           during = Time.s(10),
           actions = [send_email,send_xmpp]
       )
    ]
)




                                                   ffunction
                                                   inc.
Detecting incidents

HTTP(
   GET="http://www.google.ca/search?q=watchdog",
   freq=Time.s(1),                 When at least 5 errors...
                                    When at least 5 errors...
   timeout=Time.ms(80),
   fail=[
      Incident(
          errors = 5,
          during = Time.s(10),
          actions = [send_email,send_xmpp]
      )
   ]
)




                                                                ffunction
                                                                inc.
Detecting incidents

HTTP(
   GET="http://www.google.ca/search?q=watchdog",
   freq=Time.s(1),
   timeout=Time.ms(80),                ...happen over a 10
                                         ...happen over a 10
   fail=[                                 seconds period
                                            seconds period
      Incident(
          errors = 5,
          during = Time.s(10),
          actions = [send_email,send_xmpp]
      )
   ]
)




                                                               ffunction
                                                               inc.
Detecting incidents

HTTP(
   GET="http://www.google.ca/search?q=watchdog",
   freq=Time.s(1),
   timeout=Time.ms(80),
   fail=[
      Incident(
          errors = 5,
          during = Time.s(10),
          actions = [send_email,send_xmpp]
      )
   ]
)

          The Incident action will
           The Incident action will
         trigger the given actions
          trigger the given actions




                                                   ffunction
                                                   inc.
Example: Ensuring a service is running

from watchdog import *
Monitor(
    Service(
       name="myservice-ensure-up",
       monitor=(
           HTTP(
              GET="http://localhost:8000/",
              freq=Time.ms(500),
              fail=[
                 Incident(
                     errors=5,
                     during=Time.s(5),
                     actions=[
                        Restart("myservice-start.py")
])] )))).run()




                                                        ffunction
                                                        inc.
Example: Ensuring a service is running

from watchdog import *                   We test if we can
                                          We test if we can
Monitor(                             GET http://localhost:8000
                                      GET http://localhost:8000
    Service(                              within 500ms
                                            within 500ms
       name="myservice-ensure-up",
       monitor=(
           HTTP(
              GET="http://localhost:8000/",
              freq=Time.ms(500),
              fail=[
                 Incident(
                     errors=5,
                     during=Time.s(5),
                     actions=[
                        Restart("myservice-start.py")
])] )))).run()




                                                                  ffunction
                                                                  inc.
Example: Ensuring a service is running

from watchdog import *
Monitor(
    Service(
       name="myservice-ensure-up",
       monitor=(
           HTTP(                     If we can't reach it during
                                       If we can't reach it during
              GET="http://localhost:8000/",seconds
                                              5
                                               5 seconds
              freq=Time.ms(500),
              fail=[
                 Incident(
                     errors=5,
                     during=Time.s(5),
                     actions=[
                        Restart("myservice-start.py")
])] )))).run()




                                                                     ffunction
                                                                     inc.
Example: Ensuring a service is running

from watchdog import *
Monitor(
    Service(
       name="myservice-ensure-up",
       monitor=(
           HTTP(
              GET="http://localhost:8000/",
              freq=Time.ms(500),
              fail=[                                We kill and restart
                                                     We kill and restart
                 Incident(                          myservice-start.py
                                                     myservice-start.py
                     errors=5,
                     during=Time.s(5),
                     actions=[
                        Restart("myservice-start.py")
])] )))).run()




                                                                ffunction
                                                                inc.
Example: Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk", extract=lambda
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk", extract=lambda
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health
                            SystemInfo will retrieve
                             SystemInfo will retrieve
                            system information and
                             system information and
from watchdog import *      return it as a dictionary
Monitor (
                             return it as a dictionary
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk", extract=lambda
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health
                                                                        We log each result by
                                                                         We log each result by
                                                                         extracting the given
from watchdog import *                                                    extracting the given
                                                                        value from the result
Monitor (                                                                value from the result
     Service(                                                          dictionary (memoryUsage,
          name    = "system-health",                                    dictionary (memoryUsage,
                                                                           diskUsage,cpuUsage)
          monitor = (                                                       diskUsage,cpuUsage)
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk=", extract=lambda
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                 ffunction
                                                                                 inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                                Bandwidth collects
                    success = ( Bandwidth collects
                                 network interface
                         LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                                  network interface
                         LogResult("myserver.system.disk=", extract=lambda
                             live traffic information
                               live traffic information
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name     = "system-health",
          monitor But we don't want the
                   = (
                   But we don't want the
               SystemInfo(freq=Time.s(1),
                  total amount, we just
                    total amount, we just
                     success = (
                   want the difference.
                     wantLogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                           the difference.
                          LogResult("myserver.system.disk=", extract=lambda
                   Delta does just that.
                     Delta does just that.
r,_:reduce(max,r["diskUsage"].values())),
                          LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                     )
               ),
               Delta(
                     Bandwidth("eth0", freq=Time.s(1)),
                     extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,
                     success = [LogResult("myserver.system.eth0.sent")]
               ),
               SystemHealth(
                     cpu=0.90, disk=0.90, mem=0.90,
                     freq=Time.s(60),
                     fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk=", We print the result
                                                            extract=lambda
r,_:reduce(max,r["diskUsage"].values())),                    We print the result
                                                                as before
                         LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                                                                 as before
                    )
               ),
               Delta(
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent=")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk=", extract=lambda
                               SystemHealth will
r,_:reduce(max,r["diskUsage"].values())),
                                SystemHealth will
                           fail whenever the usage
                         LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                    )       fail whenever the usage
               ),              is above the given
                                 is above the given
               Delta(               thresholds
                                     thresholds
                    Bandwidth("eth0", freq=Time.s(1)),
                    extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,
                    success = [LogResult("myserver.system.eth0.sent=")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Monitoring system health

from watchdog import *
Monitor (
     Service(
          name    = "system-health",
          monitor = (
               SystemInfo(freq=Time.s(1),
                    success = (
                         LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),
                         LogResult("myserver.system.disk=", extract=lambda
r,_:reduce(max,r["diskUsage"].values())),
                         LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),
                    )
               ),
               Delta(                                           We'll log failures
                    Bandwidth("eth0", freq=Time.s(1)),            We'll log failures
                    extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, file
                                                                   in a log
                                                                    in a log file
                    success = [LogResult("myserver.system.eth0.sent=")]
               ),
               SystemHealth(
                    cpu=0.90, disk=0.90, mem=0.90,
                    freq=Time.s(60),
                    fail=[Log(path="watchdog-system-failures.log")]
               ),
          )
     )
).run()

                                                                                ffunction
                                                                                inc.
Watchdog: Overview


Monitoring DSL
Declarative programming to define monitoring
strategy
Wide spectrum
From data collection to incident detection
Flexible
Does not impose a specific architecture


                                             ffunction
                                             inc.
Watchdog: Use cases


Ensure service availability
Test and stop/restart when problems
Collect system statistics
Log or send data through the network
Alert on system or service health
Take actions when the system stats is above
threshold


                                         ffunction
                                         inc.
Get started !




              On Github:
http://github.com/sebastien/watchdog

            1 Python file
          Documented API



                                   ffunction
                                   inc.
Merci !

   www.ffctn.com
sebastien@ffctn.com
github.com/sebastien

                       ffunction
                       inc.

Contenu connexe

Tendances

RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)Gustavo Rene Antunez
 
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021AWSKRUG - AWS한국사용자모임
 
Jdbc Complete Notes by Java Training Center (Som Sir)
Jdbc Complete Notes by Java Training Center (Som Sir)Jdbc Complete Notes by Java Training Center (Som Sir)
Jdbc Complete Notes by Java Training Center (Som Sir)Som Prakash Rai
 
Business Process Scheduling(aATP).pptx
Business Process Scheduling(aATP).pptxBusiness Process Scheduling(aATP).pptx
Business Process Scheduling(aATP).pptxEsatEsenek1
 
Oracle ebs r12eam part2
Oracle ebs r12eam part2Oracle ebs r12eam part2
Oracle ebs r12eam part2jcvd12
 
Oracle PPM Cloud Project Financial Management - Oracle Training
Oracle PPM Cloud Project Financial Management - Oracle TrainingOracle PPM Cloud Project Financial Management - Oracle Training
Oracle PPM Cloud Project Financial Management - Oracle TrainingOracleTrainings
 
IBM DataPower Gateway appliances feature & virtual edition comparison
IBM DataPower Gateway appliances feature & virtual edition comparisonIBM DataPower Gateway appliances feature & virtual edition comparison
IBM DataPower Gateway appliances feature & virtual edition comparisonIBM DataPower Gateway
 
Oracle EBS R12 Audit trial
Oracle EBS R12 Audit trialOracle EBS R12 Audit trial
Oracle EBS R12 Audit trialFeras Ahmad
 
EMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentEMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentHaytham Ghandour
 
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2Amazon Web Services Korea
 
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017Amazon Web Services Korea
 
WebLogic authentication debugging
WebLogic authentication debuggingWebLogic authentication debugging
WebLogic authentication debuggingMaarten Smeets
 

Tendances (13)

RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)
 
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
 
Jdbc Complete Notes by Java Training Center (Som Sir)
Jdbc Complete Notes by Java Training Center (Som Sir)Jdbc Complete Notes by Java Training Center (Som Sir)
Jdbc Complete Notes by Java Training Center (Som Sir)
 
Business Process Scheduling(aATP).pptx
Business Process Scheduling(aATP).pptxBusiness Process Scheduling(aATP).pptx
Business Process Scheduling(aATP).pptx
 
Oracle ebs r12eam part2
Oracle ebs r12eam part2Oracle ebs r12eam part2
Oracle ebs r12eam part2
 
Oracle reports
Oracle reportsOracle reports
Oracle reports
 
Oracle PPM Cloud Project Financial Management - Oracle Training
Oracle PPM Cloud Project Financial Management - Oracle TrainingOracle PPM Cloud Project Financial Management - Oracle Training
Oracle PPM Cloud Project Financial Management - Oracle Training
 
IBM DataPower Gateway appliances feature & virtual edition comparison
IBM DataPower Gateway appliances feature & virtual edition comparisonIBM DataPower Gateway appliances feature & virtual edition comparison
IBM DataPower Gateway appliances feature & virtual edition comparison
 
Oracle EBS R12 Audit trial
Oracle EBS R12 Audit trialOracle EBS R12 Audit trial
Oracle EBS R12 Audit trial
 
EMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentEMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and Deployment
 
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2
 
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
 
WebLogic authentication debugging
WebLogic authentication debuggingWebLogic authentication debugging
WebLogic authentication debugging
 

Similaire à Fabric, Cuisine and Watchdog for server administration in Python

Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogConFoo
 
Life without the Novell Client
Life without the Novell ClientLife without the Novell Client
Life without the Novell ClientNovell
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCPThe Linux Foundation
 
SF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantSF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantMitchell Hashimoto
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practiceOpenCity Community
 
SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)Chiradeep Vittal
 
Leveraging the Cloud: Getting the more bang for your buck
Leveraging the Cloud: Getting the more bang for your buckLeveraging the Cloud: Getting the more bang for your buck
Leveraging the Cloud: Getting the more bang for your buckDesk
 
Dc tco in_a_nutshell
Dc tco in_a_nutshellDc tco in_a_nutshell
Dc tco in_a_nutshellerjosito
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinarCloudBees
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Triangle OpenStack Meetup
Triangle OpenStack MeetupTriangle OpenStack Meetup
Triangle OpenStack Meetupmestery
 
14 Ace 2010 Replication Workshop
14 Ace 2010 Replication Workshop14 Ace 2010 Replication Workshop
14 Ace 2010 Replication WorkshopProdeos
 
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanPlugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanRack Lin
 
In The Future We All Use Symfony2
In The Future We All Use Symfony2In The Future We All Use Symfony2
In The Future We All Use Symfony2Brent Shaffer
 
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"PROIDEA
 
2nd Eucalyptus Bay Area Meet Up with Rich Wolski
2nd Eucalyptus Bay Area Meet Up with Rich Wolski2nd Eucalyptus Bay Area Meet Up with Rich Wolski
2nd Eucalyptus Bay Area Meet Up with Rich WolskiEucalyptus Systems, Inc.
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in KubernetesMinhan Xia
 
Optimising Productivity with AWS Developer Tools
Optimising Productivity with AWS Developer ToolsOptimising Productivity with AWS Developer Tools
Optimising Productivity with AWS Developer ToolsAmazon Web Services
 

Similaire à Fabric, Cuisine and Watchdog for server administration in Python (20)

Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and Watchdog
 
Life without the Novell Client
Life without the Novell ClientLife without the Novell Client
Life without the Novell Client
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
 
SF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantSF DevOps: Introducing Vagrant
SF DevOps: Introducing Vagrant
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
 
SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)
 
Tim Cramer, Eucaday
Tim Cramer, EucadayTim Cramer, Eucaday
Tim Cramer, Eucaday
 
Leveraging the Cloud: Getting the more bang for your buck
Leveraging the Cloud: Getting the more bang for your buckLeveraging the Cloud: Getting the more bang for your buck
Leveraging the Cloud: Getting the more bang for your buck
 
Dc tco in_a_nutshell
Dc tco in_a_nutshellDc tco in_a_nutshell
Dc tco in_a_nutshell
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinar
 
PHP in the Cloud
PHP in the CloudPHP in the Cloud
PHP in the Cloud
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Triangle OpenStack Meetup
Triangle OpenStack MeetupTriangle OpenStack Meetup
Triangle OpenStack Meetup
 
14 Ace 2010 Replication Workshop
14 Ace 2010 Replication Workshop14 Ace 2010 Replication Workshop
14 Ace 2010 Replication Workshop
 
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanPlugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
 
In The Future We All Use Symfony2
In The Future We All Use Symfony2In The Future We All Use Symfony2
In The Future We All Use Symfony2
 
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"
infraxstructure: Stas Levitan, "Always On" business in cloud - 2016"
 
2nd Eucalyptus Bay Area Meet Up with Rich Wolski
2nd Eucalyptus Bay Area Meet Up with Rich Wolski2nd Eucalyptus Bay Area Meet Up with Rich Wolski
2nd Eucalyptus Bay Area Meet Up with Rich Wolski
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in Kubernetes
 
Optimising Productivity with AWS Developer Tools
Optimising Productivity with AWS Developer ToolsOptimising Productivity with AWS Developer Tools
Optimising Productivity with AWS Developer Tools
 

Dernier

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 

Dernier (20)

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 

Fabric, Cuisine and Watchdog for server administration in Python

  • 1. Fabric, Cuisine & Watchdog Sébastien Pierre, ffunction inc. @Montréal Python, February 2011 www.ffctn.com ffunction inc.
  • 2. How to use Python for Server Administration Thanks to Fabric Cuisine* & Watchdog* *custom tools ffunction inc.
  • 3. The way we use servers has changed ffunction inc.
  • 4. The era of dedicated servers Hosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER ffunction inc.
  • 5. The era of dedicated servers Hosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER Sysadmins typically Sysadmins typically SSH and configure SSH and configure the servers live the servers live ffunction inc.
  • 6. The era of dedicated servers Hosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER The servers are The servers are conservatively managed, conservatively managed, updates are risky updates are risky ffunction inc.
  • 7. The era of slices/VPS Linode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 We now have multiple We now have multiple small virtual servers small virtual servers (slices/VPS) (slices/VPS) ffunction inc.
  • 8. The era of slices/VPS Linode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 Often located in different Often located in different data-centers data-centers ffunction inc.
  • 9. The era of slices/VPS Linode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 ...and sometimes with ...and sometimes with different providers different providers ffunction inc.
  • 10. The era of slices/VPS Linode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 IWeb.com We even sometimes DEDICATED DEDICATED We even sometimes still have physical, SERVER 1 SERVER 2 still have physical, dedicated servers dedicated servers ffunction inc.
  • 11. The challenge ORDER SETUP DEPLOY SERVER SERVER APPLICATION ffunction inc.
  • 12. The challenge ORDER SETUP DEPLOY SERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
  • 13. The challenge Create users, groups Create users, groups Customize config files Customize config files Install base packages Install base packages ORDER SETUP DEPLOY SERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
  • 14. The challenge Install app-specific Install app-specific packages packages deploy application deploy application start services start services ORDER SETUP DEPLOY SERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
  • 15. The challenge ffunction inc.
  • 16. The challenge Quickly integrate your Quickly integrate your new server in the new server in the existing architecture existing architecture ffunction inc.
  • 17. The challenge ...and make sure ...and make sure it's running! it's running! ffunction inc.
  • 18. Today's menu Interact with your remote machines FABRIC as if they were local Takes care of users, group, packages CUISINE and configuration of your new machine Ensures that your servers and services WATCHDOG are up and running ffunction inc.
  • 19. Today's menu Interact with your remote machines FABRIC as if they were local Takes care of users, group, packages CUISINE Made by Made by and configuration of your new machine Ensures that your servers and services WATCHDOG are up and running ffunction inc.
  • 20. Part 1 Fabric - http://fabfile.org application deployment & systems administration tasks ffunction inc.
  • 21. Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. ffunction inc.
  • 22. Wait... what does Wait... what does that mean ? that mean ? Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. ffunction inc.
  • 23. Streamlining SSH By hand: version = os.popen(“ssh myserver 'cat /proc/version'”).read() Using Fabric: version = run(“cat /proc/version”) ffunction inc.
  • 24. Streamlining SSH By hand: version = os.popen(“ssh myserver 'cat /proc/version').read() Using Fabric: from fabric.api import * env.hosts = [“myserver”] version = run(“cat /proc/version”) ffunction inc.
  • 25. Streamlining SSH By hand: You can specify You can specify multiple hosts and run version = os.popen(“ssh myserver 'cat run multiple hosts and /proc/version').read() the same commands the same commands across them across them Using Fabric: from fabric.api import * env.hosts = [“myserver”] version = run(“cat /proc/version”) ffunction inc.
  • 26. Streamlining SSH By hand: version = os.popen(“ssh myserver 'cat /proc/version').read() Connections will be Connections will be lazily created and lazily created and pooled pooled Using Fabric: from fabric.api import * env.hosts = [“myserver”] version = run(“cat /proc/version”) ffunction inc.
  • 27. Streamlining SSH By hand: version = os.popen(“ssh myserver 'cat /proc/version').read() Using Fabric: from fabric.api import * env.hosts = [“myserver”] version = run(“cat /proc/version”) Failures ($STATUS) will Failures ($STATUS) will be handled just like in Make be handled just like in Make ffunction inc.
  • 28. Example: Installing packages sudo(“aptitude install nginx”) if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1: sudo("aptitude install '%s'" % (package) ffunction inc.
  • 29. Example: Installing packages sudo(“aptitude install nginx”) It's easy to take action It's easy to take action depending on the result depending on the result if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1: sudo("aptitude install '%s'" % (package) ffunction inc.
  • 30. Example: Installing packages Note that we add true Note that we add true sudo(“aptitude install nginx”) so that the run() always so that the run() always succeeds* succeeds* * there are other ways... * there are other ways... if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1: sudo("aptitude install '%s'" % (package) ffunction inc.
  • 31. Example: retrieving system status disk_usage = run(“df -kP”) mem_usage = run(“cat /proc/meminfo”) cpu_usage = run(“cat /proc/stat” print disk_usage, mem_usage, cpu_info ffunction inc.
  • 32. Example: retrieving system status disk_usage = run(“df -kP”) mem_usage = run(“cat /proc/meminfo”) cpu_usage = run(“cat /proc/stat” print disk_usage, mem_usage, cpu_info Very useful for getting Very useful for getting live information from live information from many different servers many different servers ffunction inc.
  • 33. Fabfile.py from fabric.api import * from mysetup import * env.host = [“server1.myapp.com”] def setup(): install_packages(“...”) update_configuration() create_users() start_daemons() $ fab setup ffunction inc.
  • 34. Fabfile.py from fabric.api import * from mysetup import * env.host = [“server1.myapp.com”] def setup(): install_packages(“...”) update_configuration() create_users() start_daemons() Just like Make, you Just like Make, you write rules that do write rules that do something something $ fab setup ffunction inc.
  • 35. Fabfile.py from fabric.api import * from mysetup import * env.host = [“server1.myapp.com”] def setup(): install_packages(“...”) update_configuration() ...and you can specify create_users() ...and you can specify on which servers the rules start_daemons() on which servers the rules will run will run $ fab setup ffunction inc.
  • 36. Multiple hosts env.hosts = [ “db1.myapp.com”, “db2.myapp.com”, “db3.myapp.com” ] @hosts(“db1.myapp”) def backup_db(): run(...) ffunction inc.
  • 37. Roles env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2'] } $ fab -R web setup ffunction inc.
  • 38. Roles env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2'] } $ fab -R web setup Will run the setup rule Will run the setup rule only on hosts members only on hosts members of the web role. of the web role. ffunction inc.
  • 39. What's good about Fabric? Low-level Basically an ssh() command that returns the result Simple primitives run(), sudo(), get(), put(), local(), prompt(), reboot() No magic No DSL, no abstraction, just a remote command API ffunction inc.
  • 40. What could be improved ? Ease common admin tasks User, group creation. Files, directory operations. Abstract primitives Like install package, so that it works with different OS Templates To make creating/updating configuration files easy ffunction inc.
  • 41. Cuisine: Chef-like functionality for Fabric ffunction inc.
  • 42. Part 2 Cuisine ffunction inc.
  • 43. What is Opscode's Chef? http://wiki.opscode.com/display/chef/Home Recipes Scripts/packages to install and configure services and applications API A DSL-like Ruby API to interact with the OS (create users, groups, install packages, etc) Architecture Client-server or “solo” mode to push and deploy your new configurations ffunction inc.
  • 44. What I liked about Chef Flexible You can use the API or shell commands Structured Helped me have a clear decomposition of the services installed per machine Community Lots of recipes already available from http://cookbooks.opscode.com/ ffunction inc.
  • 45. What I didn't like Too many files and directories Code is spread out, hard to get the big picture Abstraction overload API not very well documented, frequent fall backs to plain shell scripts within the recipe No “smart” recipe Recipes are applied all the time, even when it's not necessary ffunction inc.
  • 46. The question that kept coming... sudo aptitude install apache2 python django- python Django recipe: 5 files, 2 directories What it does, in essence ffunction inc.
  • 47. The question that kept coming... Is this really necessary Is this really necessary for what I want to do ? sudo aptitude install for what I want to do ? apache2 python django- python Django recipe: 5 files, 2 directories What it does, in essence ffunction inc.
  • 48. What I loved about Fabric Bare metal ssh() function, simple and elegant set of primitives No magic No abstraction, no model, no compilation Two-way communication Easy to change the rule's behaviour according to the output (ex: do not install something that's already installed) ffunction inc.
  • 49. What I needed Fabric ffunction inc.
  • 50. What I needed File I/O File I/O Fabric ffunction inc.
  • 51. What I needed User/Group User/Group File I/O File I/O Management Management Fabric ffunction inc.
  • 52. What I needed User/Group User/Group Package Package File I/O File I/O Management Management Management Management Fabric ffunction inc.
  • 53. What I needed Text processing & Templates Text processing & Templates User/Group User/Group Package Package File I/O File I/O Management Management Management Management Fabric ffunction inc.
  • 54. How I wanted it Simple “flat” API [object]_[operation] where operation is something in “create”, “read”, “update”, “write”, “remove”, “ensure”, etc... Driven by need Only implement a feature if I have a real need for it No magic Everything is implemented using sh-compatible commands No unnecessary structure Everything fits in one file, no imposed file layout ffunction inc.
  • 55. Cuisine: Example fabfile.py from cuisine import * env.host = [“server1.myapp.com”] def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”) $ fab setup ffunction inc.
  • 56. Cuisine:Fabric's coreimportedfabfile.py Example functions Fabric's core functions are already are already imported from cuisine import * env.host = [“server1.myapp.com”] def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”) $ fab setup ffunction inc.
  • 57. Cuisine: Example fabfile.py from cuisine import * env.host = [“server1.myapp.com”] def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”) Cuisine's API $ fab setup Cuisine's API calls calls ffunction inc.
  • 58. File I/O ffunction inc.
  • 59. Cuisine : File I/O ● file_exists does remote file exists? ● file_read reads remote file ● file_write write data to remote file ● file_append appends data to remote file ● file_attribs chmod & chown ● file_remove ffunction inc.
  • 60. Cuisine : File I/O Supports owner/group ● file_exists does remote file exists? Supports owner/group and mode change and mode change ● file_read reads remote file ● file_write write data to remote file ● file_append appends data to remote file ● file_attribs chmod & chown ● file_remove ffunction inc.
  • 61. Cuisine : File I/O (directories) ● dir_exists does remote file exists? ● dir_ensure ensures that a directory exists ● dir_attribs chmod & chown ● dir_remove ffunction inc.
  • 62. Cuisine : File I/O + ● file_update(location, updater=lambda _:_) package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] for line in text.split("n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
  • 63. Cuisine : File I/O + This replaces the values for This replaces the values for ● file_update(location, updater=lambda _:_) configuration entries configuration entries dbpath and logpath dbpath and logpath package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] for line in text.split("n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
  • 64. Cuisine : File I/O + ● file_update(location, updater=lambda _:_) package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] The remote file will only be The remote file line in text.split("n"): for will only be changed if the content changed if the content if line.strip().startswith("dbpath="): is different is different res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
  • 65. User Management ffunction inc.
  • 66. Cuisine: User Management ● user_exists does the user exists? ● user_create create the user ● user_ensure create the user if it doesn't exist ffunction inc.
  • 67. Cuisine: Group Management ● group_exists does the group exists? ● group_create create the group ● group_ensure create the group if it doesn't exist ● group_user_exists does the user belong to the group? ● group_user_add adds the user to the group ● group_user_ensure ffunction inc.
  • 68. Package Management ffunction inc.
  • 69. Cuisine: Package Management ● package_exists is the package available ? ● package_installed is it installed ? ● package_install install the package ● package_ensure ... only if it's not installed ● package_upgrade upgrades the/all package(s) ffunction inc.
  • 70. Text & Templates ffunction inc.
  • 71. Cuisine: Text transformation text_ensure_line(text, lines) file_update( "/home/user/.profile", lambda _:text_ensure_line(_, "PYTHONPATH=/opt/lib/python:${PYTHONPATH};" "export PYTHONPATH" )) ffunction inc.
  • 72. Cuisine: Text transformation Ensures that the PYTHONPATH Ensures that the PYTHONPATH variable is set and exported, text_ensure_line(text, lines) variable is set and exported, If not, these lines will be If not, these lines will be appended. appended. file_update( "/home/user/.profile", lambda _:text_ensure_line(_, "PYTHONPATH=/opt/lib/python:${PYTHONPATH};" "export PYTHONPATH" )) ffunction inc.
  • 73. Cuisine: Text transformation text_replace_line(text, old, new, find=.., process=...) configuration = local_read("server.conf") for key, value in variables.items(): configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
  • 74. Cuisine: Text transformation Replaces lines that look like Replaces lines that look like VARIABLE=VALUE text_replace_line(text, old, new, find=.., process=...) VARIABLE=VALUE with the actual values from the with the actual values from the variables dictionary. variables dictionary. configuration = local_read("server.conf") for key, value in variables.items(): configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
  • 75. Cuisine: Text transformation text_replace_line(text, old, new, find=..,process lambda transforms The process=...) The process lambda transforms input lines before comparing input lines before comparing them. them. configuration = local_read("server.conf")lines are stripped Here the Here the lines are stripped for key, value in variables.items(): of spaces and of their value. of spaces and of their value. configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
  • 76. Cuisine: Text transformation text_strip_margin(text) file_write(".profile", text_strip_margin( """ |export PATH="$HOME/bin":$PATH |set -o vi """ )) ffunction inc.
  • 77. Cuisine: Text transformation Everything after the | separator Everything after the | separator will be output as content. will be output as content. text_strip_margin(text) It allows to easily embed text It allows to easily embed text templates within functions. templates within functions. file_write(".profile", text_strip_margin( """ |export PATH="$HOME/bin":$PATH |set -o vi """ )) ffunction inc.
  • 78. Cuisine: Text transformation text_template(text, variables) text_template(text_strip_margin( """ |cd ${DAEMON_PATH} |exec ${DAEMON_EXEC_PATH} """ ), dict( DAEMON_PATH="/opt/mongodb", DAEMON_EXEC_PATH="/opt/mongodb/mongod" )) ffunction inc.
  • 79. Cuisine: Text transformation This is a simple wrapper text_template(text, variables) This is a simple wrapper around Python (safe) around Python (safe) string.template() function string.template() function text_template(text_strip_margin( """ |cd ${DAEMON_PATH} |exec ${DAEMON_EXEC_PATH} """ ), dict( DAEMON_PATH="/opt/mongodb", DAEMON_EXEC_PATH="/opt/mongodb/mongod" )) ffunction inc.
  • 80. Cuisine: Goodies ● ssh_keygen generates DSA keys ● ssh_authorize authorizes your key on the remote server ● mode_sudo run() always uses sudo ● upstart_ensure ensures the given daemon is running & more! ffunction inc.
  • 81. Why use Cuisine ? ● Simple API for remote-server manipulation Files, users, groups, packages ● Shell commands for specific tasks only Avoid problems with your shell commands by only using run() for very specific tasks ● Cuisine tasks are not stupid *_ensure() commands won't do anything if it's not necessary ffunction inc.
  • 82. Limitations ● Limited to sh-shells Operations will not work under csh ● Only written/tested for Ubuntu Linux Contributors could easily port commands ffunction inc.
  • 83. Get started ! On Github: http://github.com/sebastien/cuisine 1 short Python file Documented API ffunction inc.
  • 84. Part 3 Watchdog Server and services monitoring ffunction inc.
  • 85. The problem ffunction inc.
  • 86. The problem Low disk space Low disk space ffunction inc.
  • 87. The problem Archive files Archive files Rotate logs Rotate logs Purge cache Purge cache ffunction inc.
  • 88. The problem HTTP server HTTP server has high has high latency latency ffunction inc.
  • 89. The problem Restart HTTP Restart HTTP server server ffunction inc.
  • 90. The problem System load System load is too high is too high ffunction inc.
  • 91. The problem re-nice re-nice important important processes processes ffunction inc.
  • 92. We want to be notified when incidents happen ffunction inc.
  • 93. We want automatic actions to be taken whenever possible ffunction inc.
  • 94. (Some of the) existing solutions Monit, God, Supervisord, Upstart Focus on starting/restarting daemons and services Munin, Cacti Focus on visualization of RRDTool data Collectd Focus on collecting and publishing data ffunction inc.
  • 95. The ideal tool Wide spectrum Data collection, service monitoring, actions Easy setup and deployment No complex installation or configuration Flexible server architecture Can monitor local or remote processes Customizable and extensible From restarting deamons to monitoring whole servers ffunction inc.
  • 96. Hello, Watchdog! SERVICE ffunction inc.
  • 97. Hello, Watchdog! SERVICE RULE ffunction inc.
  • 98. Hello, Watchdog! A service is a A service is a collection of collection of RULES RULES SERVICE RULE ffunction inc.
  • 99. Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth ffunction inc.
  • 100. Hello, Watchdog! SERVICE Each rule retrieves Each rule retrieves data and processes it. HTTP Request data and processes it. Rules can SUCCEED RULE CPU, Disk, Mem % Rules can SUCCEED or FAIL Process status or FAIL I/O Bandwidth ffunction inc.
  • 101. Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth ACTION ffunction inc.
  • 102. Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth Logging XMPP, Email notifications ACTION Start/stop process …. ffunction inc.
  • 103. Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth Actions are bound Actions are bound Logging to rule, triggered to rule, triggered on rule SUCCESS XMPP, Email notifications on rule SUCCESS ACTION or FAILURE Start/stop process or FAILURE …. ffunction inc.
  • 104. Execution Model MONITOR ffunction inc.
  • 105. Execution Model SERVICE DEFINITION RULE MONITOR (frequency in ms) ffunction inc.
  • 106. Services are registered Services are registered Execution Model in the monitor in the monitor SERVICE DEFINITION RULE MONITOR (frequency in ms) ffunction inc.
  • 107. Execution Model Rules defined in the Rules defined in the service are executed service are executed every N ms every N ms (frequency) SERVICE DEFINITION (frequency) RULE MONITOR (frequency in ms) ffunction inc.
  • 108. Execution Model SERVICE DEFINITION RULE MONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION ffunction inc.
  • 109. Execution Model SERVICE DEFINITION RULE MONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION If the rule SUCCEEDS If the rule SUCCEEDS actions will be actions will be sequentially executed sequentially executed ffunction inc.
  • 110. Execution Model SERVICE DEFINITION RULE MONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION If the rule FAIL If the rule FAIL failure actions will be failure actions will be sequentially executed sequentially executed ffunction inc.
  • 111. Monitoring a remote machine #!/usr/bin/env python from watchdog import * Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) ).run() ffunction inc.
  • 112. Monitoring a remote machine A monitor is like the A monitor is like the “main” for Watchdog. #!/usr/bin/env python “main” for Watchdog. It actively monitors from watchdog import * It actively monitors Monitor( services. services. Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) ).run() ffunction inc.
  • 113. Monitoring a remote machine #!/usr/bin/env python from watchdog import * Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) ).run() Don't forget to call Don't forget to call run() on it run() on it ffunction inc.
  • 114. Monitoring a remote machine #!/usr/bin/env python The service monitors from watchdog import * The service monitors the rules Monitor( the rules Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) ).run() ffunction inc.
  • 115. Monitoring a remote machine #!/usr/bin/env python from watchdog import * The HTTP rule The HTTP rule Monitor( allows to test allows to test Service( an URL name = "google-search-latency", an URL monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) And we display a And we display a ).run() message in case message in case of failure of failure ffunction inc.
  • 116. Monitoring a remote machine #!/usr/bin/env python from watchdog import * Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) If it there is a 4XX or ) If it there is a 4XX or it timeouts, the rule ).run() it timeouts, the rule will fail and display will fail and display an error message an error message ffunction inc.
  • 117. Monitoring a remote machine $ python example-service-monitoring.py 2011-02-27T22:33:18 watchdog --- #0 (runners=1,threads=2,duration=0.57s) 2011-02-27T22:33:18 watchdog [!] Failure on HTTP(GET="www.google.ca:80/search? q=watchdog",timeout=0.08) : Socket error: timed out Google search query took more than 50ms 2011-02-27T22:33:19 watchdog --- #1 (runners=1,threads=2,duration=0.73s) 2011-02-27T22:33:20 watchdog --- #2 (runners=1,threads=2,duration=0.54s) 2011-02-27T22:33:21 watchdog --- #3 (runners=1,threads=2,duration=0.69s) 2011-02-27T22:33:22 watchdog --- #4 (runners=1,threads=2,duration=0.77s) 2011-02-27T22:33:23 watchdog --- #5 (runners=1,threads=2,duration=0.70s) ffunction inc.
  • 118. Sending Email Notification send_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword" ) […] HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email ] ) ffunction inc.
  • 119. Sending Email Notification send_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword" ) […] HTTP( The Email rule will send GET="http://www.google.ca/search?q=watchdog", to send The Email rule will an email freq=Time.s(1), an email to notifications@ffctn.com timeout=Time.ms(80), notifications@ffctn.com when triggered fail=[ when triggered send_email ] ) ffunction inc.
  • 120. Sending Email Notification send_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword" ) […] HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email ] ) This is how we bind the This is how we bind the action to the rule failure action to the rule failure ffunction inc.
  • 121. Sending Email+Jabber Notification send_xmpp = XMPP( "notifications@jabber.org", "Watchdog: Google search latency over 80ms", "myuser@jabber.org", "myspassword" ) […] HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email, send_xmpp ] ) ffunction inc.
  • 122. Monitoring incident: when something fails repeatedly during a given period of time ffunction inc.
  • 123. Monitoring incident: when something fails repeatedly during a given period of time You don't want to be You don't want to be notified all the time, notified all the time, only when it really only when it really matters. matters. ffunction inc.
  • 124. Detecting incidents HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ] ) ffunction inc.
  • 125. Detecting incidents An incident is a “smart” An incident is a “smart” action : it will only do action : it will only do something when the HTTP( something when the condition is met GET="http://www.google.ca/search?q=watchdog", condition is met freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ] ) ffunction inc.
  • 126. Detecting incidents HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), When at least 5 errors... When at least 5 errors... timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ] ) ffunction inc.
  • 127. Detecting incidents HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), ...happen over a 10 ...happen over a 10 fail=[ seconds period seconds period Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ] ) ffunction inc.
  • 128. Detecting incidents HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ] ) The Incident action will The Incident action will trigger the given actions trigger the given actions ffunction inc.
  • 129. Example: Ensuring a service is running from watchdog import * Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py") ])] )))).run() ffunction inc.
  • 130. Example: Ensuring a service is running from watchdog import * We test if we can We test if we can Monitor( GET http://localhost:8000 GET http://localhost:8000 Service( within 500ms within 500ms name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py") ])] )))).run() ffunction inc.
  • 131. Example: Ensuring a service is running from watchdog import * Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( If we can't reach it during If we can't reach it during GET="http://localhost:8000/",seconds 5 5 seconds freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py") ])] )))).run() ffunction inc.
  • 132. Example: Ensuring a service is running from watchdog import * Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ We kill and restart We kill and restart Incident( myservice-start.py myservice-start.py errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py") ])] )))).run() ffunction inc.
  • 133. Example: Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambda r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 134. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambda r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 135. Monitoring system health SystemInfo will retrieve SystemInfo will retrieve system information and system information and from watchdog import * return it as a dictionary Monitor ( return it as a dictionary Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambda r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 136. Monitoring system health We log each result by We log each result by extracting the given from watchdog import * extracting the given value from the result Monitor ( value from the result Service( dictionary (memoryUsage, name = "system-health", dictionary (memoryUsage, diskUsage,cpuUsage) monitor = ( diskUsage,cpuUsage) SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambda r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 137. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), Bandwidth collects success = ( Bandwidth collects network interface LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), network interface LogResult("myserver.system.disk=", extract=lambda live traffic information live traffic information r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 138. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor But we don't want the = ( But we don't want the SystemInfo(freq=Time.s(1), total amount, we just total amount, we just success = ( want the difference. wantLogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), the difference. LogResult("myserver.system.disk=", extract=lambda Delta does just that. Delta does just that. r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 139. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", We print the result extract=lambda r,_:reduce(max,r["diskUsage"].values())), We print the result as before LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), as before ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 140. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambda SystemHealth will r,_:reduce(max,r["diskUsage"].values())), SystemHealth will fail whenever the usage LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) fail whenever the usage ), is above the given is above the given Delta( thresholds thresholds Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 141. Monitoring system health from watchdog import * Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambda r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( We'll log failures Bandwidth("eth0", freq=Time.s(1)), We'll log failures extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, file in a log in a log file success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) ) ).run() ffunction inc.
  • 142. Watchdog: Overview Monitoring DSL Declarative programming to define monitoring strategy Wide spectrum From data collection to incident detection Flexible Does not impose a specific architecture ffunction inc.
  • 143. Watchdog: Use cases Ensure service availability Test and stop/restart when problems Collect system statistics Log or send data through the network Alert on system or service health Take actions when the system stats is above threshold ffunction inc.
  • 144. Get started ! On Github: http://github.com/sebastien/watchdog 1 Python file Documented API ffunction inc.
  • 145. Merci ! www.ffctn.com sebastien@ffctn.com github.com/sebastien ffunction inc.