SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
CERN  Data  Centre  Evolution
Gavin  McCance
gavin.mccance@cern.ch
@gmccance
SDCD12:  Supporting  Science  with  Cloud  Computing
Bern
19th November  2012
What  is  CERN  ?
Gavin  McCance,  CERN 2
• Conseil Européen pour  la  
Recherche Nucléaire – aka  
European  Laboratory  for  
Particle  Physics
• Between  Geneva  and  the  
Jura  mountains,  straddling  
the  Swiss-­‐French  border
• Founded  in  1954  with  an  
international  treaty
• Our  business  is  fundamental  
physics  ,  what  is  the  
universe  made  of  and  how  
does  it  work
Gavin  McCance,  CERN 3
Answering fundamental questions…
• How  to  explain particles have  mass?
We have  theories and  accumulating experimental evidence..  Getting close…
• What is 96%  of  the  universe made  of  ?
We can only see 4%  of  its estimated mass!
• Why isn’t there anti-­‐matter
in  the  universe?
Nature  should be symmetric…
• What was the  state  of  matter just
after the  « Big Bang »  ?
Travelling  back  to  the  earliest instants  of
the  universe would help…
4
The  Large  Hadron  Collider  (LHC)  tunnel
Gavin  McCance,  CERN
Gavin  McCance,  CERN 5
Gavin  McCance,  CERN 6
• Data  Centre  by  Numbers
– Hardware  installation  &  retirement
• ~7,000  hardware  movements/year;  ~1,800  disk  failures/year
Xeon  
5150
2%
Xeon  
5160
10%
Xeon  
E5335
7%
Xeon  
E5345
14%
Xeon  
E5405
6%
Xeon  
E5410
16%
Xeon  
L5420
8%
Xeon  
L5520
33%
Xeon  
3GHz
4%
Fujitsu
3%
Hitachi
23%
HP
0%
Maxtor
0%
Seagate
15%
Western  
Digital
59%
Other
0%
High  Speed  Routers
(640 Mbps  →  2.4  Tbps)
24
Ethernet  Switches 350
10  Gbps  ports 2,000
Switching  Capacity 4.8 Tbps
1  Gbps  ports 16,939
10  Gbps  ports 558
Racks 828
Servers 11,728
Processors 15,694
Cores 64,238
HEPSpec06 482,507
Disks 64,109
Raw  disk  capacity  (TiB) 63,289
Memory  modules 56,014
Memory  capacity  (TiB) 158
RAID  controllers 3,749
Tape  Drives 160
Tape  Cartridges 45,000
Tape  slots 56,000
Tape Capacity  (TiB) 73,000
IT  Power  Consumption 2,456  KW
Total Power  Consumption 3,890  KW
Current  infrastructure
• Around  12k  servers
– Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes
– Majority  Scientific  Linux  (RHEL5/6  clone)
– Mostly  running  on  real  hardware
– Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes  
onto  Microsoft  HyperV
– Various  other  virtualisation  projects  around
• In  2002  we  developed  our  own  management  toolset
– Quattor /  CDB  configuration  tool
– Lemon  computer  monitoring
– Open  source,  but  a  small  community
Gavin  McCance,  CERN 7
• Many  diverse  applications  (”clusters”)  
• Managed  by  different  teams  (CERN  IT  +  experiment  groups)
Gavin  McCance,  CERN 8
New  data  centre  to  expand  capacity
Gavin  McCance,  CERN 9
• Data  centre  in  Geneva  
at  the  limit  of  
electrical  capacity  at  
3.5MW
• New  centre  chosen  in  
Budapest,  Hungary
• Additional  2.7MW  of  
usable  power
• Hands  off  facility
• Deploying  from  2013  
with  200Gbit/s  
network  to  CERN
Time  to  change  strategy
• Rationale
– Need  to  manage  twice  the  servers  as  today
– No  increase  in  staff  numbers
– Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is
• Approach
– CERN  is  no  longer  a  special  case  for  compute
– Adopt  an  open  source  tool  chain  model
– Our  engineers  rapidly  iterate
• Evaluate  solutions  in  the  problem  domain
• Identify  functional  gaps  and  challenge  old  assumptions
• Select  first  choice  but  be  prepared  to  change  in  future
– Contribute  new  function  back  to  the  community
Gavin  McCance,  CERN 10
Building  Blocks
Gavin  McCance,  CERN 11
Bamboo  
Koji,  Mock
AIMS/PXE
Foreman
Yum  repo
Pulp
Puppet-­DB
mcollective,  yum
JIRA
Lemon  /
Hadoop
git
OpenStack  
Nova
Hardware  
database
Puppet
Active  Directory  /
LDAP
Choose  Puppet  for  Configuration
• The  tool  space  has  exploded  in  last  few  years
– In  configuration  management  and  operations
• Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’
• Many  large  enterprises  now  use  Puppet
– Its  declarative  approach  fits  what  we’re  used  to  at  CERN
– Large  installations:  friendly,  wide-­‐based  community
– You  can  buy  books  on  it
– You  can  employ  people  who  know  it  better  than  do
Gavin  McCance,  CERN 12
Puppet  Experience
• Excellent:  basic  puppet  is  easy  to  setup
and  can  be  scaled-­‐up  well
• Well  documented,  configuring  services  with  it  is  easy
• Handle  our  cluster  diversity  and  dynamic  clouds  well
• Lots  of  resource  (“modules”)  online,  though  of  varying  quality
• Large,  responsive  community  to  help
• Lots  of  nice  tooling  for  free
– Configuration  version  control  and  branching:  integrates  well  with  git
– Dashboard:  we  use  the  Foreman dashboard
• We’re  moving  all  our  production  service  over  in  2013
Gavin  McCance,  CERN 13
Gavin  McCance,  CERN 14
Preparing  the  move  to  cloud
• Improve  operational  efficiency  and  dynamicness
– Dynamic  multiple  operating  system  demand
– Dynamic  temporary  load  spikes  for  special  activities
– Hardware  interventions  with  long  running  programs  (live  migration)
• Improve  resource  efficiency
– Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O
– Highly  variable  load  such  as  interactive  or  build  machines
• Enable  cloud  architectures
– Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and  
workflows
• Improve  responsiveness
– Self-­‐Service  with  coffee  break  response  time
Gavin  McCance,  CERN 15
What  is  OpenStack  ?
• OpenStack  is  a  cloud  operating  system  that  controls  large  
pools  of  compute,  storage,  and  networking  resources  
throughout  a  datacenter,  all  managed  through  a  dashboard  
that  gives  administrators  control  while  empowering  their  users  
to  provision  resources  through  a  web  interface
Gavin  McCance,  CERN 16
Service  Model
Gavin  McCance,  CERN 17
• Pets  are  given  names  like  
pussinboots.cern.ch  
• They  are  unique,  lovingly  hand  raised  
and  cared  for
• When  they  get  ill,  you  nurse  them  back  
to  health
• Cattle  are  given  numbers  like  
vm0042.cern.ch
• They  are  almost  identical  to  other  cattle
• When  they  get  ill,  you  get  another  one
• Future  application  architectures  should  use  Cattle  but  Pets  with  
strong  configuration  management  are  viable  and  still  needed
Borrowed  from
@randybias at  Cloudscaling
http://www.slideshare.net/randybias/the-­‐cloud-­‐
revolution-­‐cyber-­‐press-­‐forum-­‐philippines
Basic  Openstack Components
Gavin  McCance,  CERN 18
Compute Scheduler
NetworkVolume
Registry Image
KEYSTONE HORIZON
NOVAGLANCE
• Each  component  has  an  API  and  is  pluggable
• Other  non-­‐core  projects  interact  with  these  components  
Supporting  the  Pets  with  OpenStack
• Network
– Interfacing  with  legacy  site  DNS  and  IP  management
– Ensuring  Kerberos  identity  before  VM  start
• Puppet
– Ease  use  of  configuration  management  tools  with  our  users
– Exploit  mcollective  for  orchestration/delegation
• External  Block  Storage
– Currently  using  nova-­‐volume  with  Gluster backing  store
• Live  migration  to  maximise  availability
– KVM  live  migration  using  Gluster
– KVM  and  Hyper-­‐V  block  migration
Gavin  McCance,  CERN 19
Current  Status  of  OpenStack  at  CERN
• Working  on  an  Essex  code  base  from  the  EPEL  repository
– Excellent  experience  with  the  Fedora  cloud-­‐sig  team
– Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora
• Components
– Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V
– Keystone  running  with  Active  Directory  and  Glance  for  Linux  and  
Windows  images
• Pre-­‐production  facility  with  around  200  Hypervisors,    with  
2000  VMs  integrated  with  CERN  infrastructure
– used  for  simulation  of  magnet  placement  using  LHC@Home and  batch  
physics  programs
Gavin  McCance,  CERN 20
Gavin  McCance,  CERN 21
Next  Steps
• Deploy  into  production  at  the  start  of  2013  with  Folsom  running  
production  services  and  compute  on  top  of  OpenStack  IaaS
• Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary
• Exploit  new  functionality
– Ceilometer  for  metering
– Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers
– X.509  user  certificate  authentication
– Load  balancing  as  a  service
Ramping  to  15K  hypervisors  with  100K  
VMs  by  2015  
Gavin  McCance,  CERN 22
Conclusions
• CERN  computer  centre  is  expanding
• We’re  in  the  process  of  refurbishing  the  tools  we  use  
to  manage  the  centre  based  on  Openstack for  IaaS
and  Puppet for  configuration  management
• Production  at  CERN  in  next  few  months  on  Folsom
– Gradual  migration  of  all  our  services
• Community  is  key  to  shared  success
– CERN  contributes  and  benefits
Gavin  McCance,  CERN 23
BACKUP  SLIDES
Gavin  McCance,  CERN 24
Training  and  Support
• Buy  the  book  rather  than  guru  mentoring
• Follow  the  mailing  lists  to  learn
• Newcomers  are  rapidly  productive  (and  often  know  more  than  us)
• Community  and  Enterprise  support  means  we’re  not  on  our  own
Gavin  McCance,  CERN 25
Staff  Motivation
• Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts  
end
Gavin  McCance,  CERN 26
When  communities  combine…
• OpenStack’s  many  components  and  options  make  
configuration  complex  out  of  the  box
• Puppet  forge module  from  PuppetLabs  does  our  configuration
• The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a  
configured  machine  in  15  minutes
Gavin  McCance,  CERN 27
Foreman  to  manage  Puppetized VM
Gavin  McCance,  CERN 28
Active  Directory  Integration
• CERN’s  Active  Directory
– Unified  identity  management  across  the  site
– 44,000  users
– 29,000  groups
– 200  arrivals/departures  per  month
• Full  integration  with  Active  Directory  via  LDAP
– Uses  the  OpenLDAP backend  with  some  particular  configuration  
settings
– Aim  for  minimal  changes  to  Active  Directory
– 7  patches  submitted  around  hard  coded  values  and  additional  filtering
• Now  in  use  in  our  pre-­‐production  instance
– Map  project  roles  (admins,  members)  to  groups
– Documentation  in  the  OpenStack  wiki
Gavin  McCance,  CERN 29
What  are  we  missing  (or  haven’t  found  yet)  ?
• Best  practice  for
– Monitoring  and  KPIs  as  part  of  core  functionality
– Guest  disaster  recovery
– Migration  between  versions  of  OpenStack
• Roles  within  multi-­‐user  projects
– VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete)
– Project  admins  allowed  to  manage  all  resources
– Other  members  should  not  have  high  rights  over  other  members  VMs
• Global  quota  management  for  non-­‐elastic  private  cloud
– Manage  resource  prioritisation  and  allocation  centrally
– Capacity  management  /  utilisation  for  planning
Gavin  McCance,  CERN 30
Opportunistic  Clouds  in  online  experiment  farms
• The  CERN  experiments  have  farms  of  1000s  of  Linux  servers  
close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s  
to  be  recorded  to  tape
• When  the  accelerator  is  not  running,  these  machines  are  
currently    idle
– Accelerator  has  regular  maintenance  slots  of  several  days
– Long  Shutdown  due  from  March  2013-­‐November  2014
• One  of  the  experiments  are  deploying  OpenStack  on  their  farm
– Simulation  (low  I/O,  high  CPU)
– Analysis  (high  I/O,  high  CPU,  high  network)
Gavin  McCance,  CERN 31
New  architecture  data  flows
Gavin  McCance,  CERN 32

Contenu connexe

Tendances

Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariDataWorks Summit
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
Identifying and Managing Technical Debt
Identifying and Managing Technical DebtIdentifying and Managing Technical Debt
Identifying and Managing Technical Debtzazworka
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Thomas Riley
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Thanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringThanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringBartłomiej Płotka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 

Tendances (20)

Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Identifying and Managing Technical Debt
Identifying and Managing Technical DebtIdentifying and Managing Technical Debt
Identifying and Managing Technical Debt
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Thanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringThanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoring
 
Intro to Knative
Intro to KnativeIntro to Knative
Intro to Knative
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 

Similaire à CERN Data Centre Evolution

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab OverviewEd Dodds
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Eshed Gal-Or
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Ian Lumb
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSamir Ibradzic
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSusan Wu
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCOTesora
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentKamesh Pemmaraju
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case StudyPuppet
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data DataCentred
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 

Similaire à CERN Data Centre Evolution (20)

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCO
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deployment
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case Study
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

CERN Data Centre Evolution

  • 1. CERN  Data  Centre  Evolution Gavin  McCance gavin.mccance@cern.ch @gmccance SDCD12:  Supporting  Science  with  Cloud  Computing Bern 19th November  2012
  • 2. What  is  CERN  ? Gavin  McCance,  CERN 2 • Conseil Européen pour  la   Recherche Nucléaire – aka   European  Laboratory  for   Particle  Physics • Between  Geneva  and  the   Jura  mountains,  straddling   the  Swiss-­‐French  border • Founded  in  1954  with  an   international  treaty • Our  business  is  fundamental   physics  ,  what  is  the   universe  made  of  and  how   does  it  work
  • 3. Gavin  McCance,  CERN 3 Answering fundamental questions… • How  to  explain particles have  mass? We have  theories and  accumulating experimental evidence..  Getting close… • What is 96%  of  the  universe made  of  ? We can only see 4%  of  its estimated mass! • Why isn’t there anti-­‐matter in  the  universe? Nature  should be symmetric… • What was the  state  of  matter just after the  « Big Bang »  ? Travelling  back  to  the  earliest instants  of the  universe would help…
  • 4. 4 The  Large  Hadron  Collider  (LHC)  tunnel Gavin  McCance,  CERN
  • 6. Gavin  McCance,  CERN 6 • Data  Centre  by  Numbers – Hardware  installation  &  retirement • ~7,000  hardware  movements/year;  ~1,800  disk  failures/year Xeon   5150 2% Xeon   5160 10% Xeon   E5335 7% Xeon   E5345 14% Xeon   E5405 6% Xeon   E5410 16% Xeon   L5420 8% Xeon   L5520 33% Xeon   3GHz 4% Fujitsu 3% Hitachi 23% HP 0% Maxtor 0% Seagate 15% Western   Digital 59% Other 0% High  Speed  Routers (640 Mbps  →  2.4  Tbps) 24 Ethernet  Switches 350 10  Gbps  ports 2,000 Switching  Capacity 4.8 Tbps 1  Gbps  ports 16,939 10  Gbps  ports 558 Racks 828 Servers 11,728 Processors 15,694 Cores 64,238 HEPSpec06 482,507 Disks 64,109 Raw  disk  capacity  (TiB) 63,289 Memory  modules 56,014 Memory  capacity  (TiB) 158 RAID  controllers 3,749 Tape  Drives 160 Tape  Cartridges 45,000 Tape  slots 56,000 Tape Capacity  (TiB) 73,000 IT  Power  Consumption 2,456  KW Total Power  Consumption 3,890  KW
  • 7. Current  infrastructure • Around  12k  servers – Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes – Majority  Scientific  Linux  (RHEL5/6  clone) – Mostly  running  on  real  hardware – Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes   onto  Microsoft  HyperV – Various  other  virtualisation  projects  around • In  2002  we  developed  our  own  management  toolset – Quattor /  CDB  configuration  tool – Lemon  computer  monitoring – Open  source,  but  a  small  community Gavin  McCance,  CERN 7
  • 8. • Many  diverse  applications  (”clusters”)   • Managed  by  different  teams  (CERN  IT  +  experiment  groups) Gavin  McCance,  CERN 8
  • 9. New  data  centre  to  expand  capacity Gavin  McCance,  CERN 9 • Data  centre  in  Geneva   at  the  limit  of   electrical  capacity  at   3.5MW • New  centre  chosen  in   Budapest,  Hungary • Additional  2.7MW  of   usable  power • Hands  off  facility • Deploying  from  2013   with  200Gbit/s   network  to  CERN
  • 10. Time  to  change  strategy • Rationale – Need  to  manage  twice  the  servers  as  today – No  increase  in  staff  numbers – Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is • Approach – CERN  is  no  longer  a  special  case  for  compute – Adopt  an  open  source  tool  chain  model – Our  engineers  rapidly  iterate • Evaluate  solutions  in  the  problem  domain • Identify  functional  gaps  and  challenge  old  assumptions • Select  first  choice  but  be  prepared  to  change  in  future – Contribute  new  function  back  to  the  community Gavin  McCance,  CERN 10
  • 11. Building  Blocks Gavin  McCance,  CERN 11 Bamboo   Koji,  Mock AIMS/PXE Foreman Yum  repo Pulp Puppet-­DB mcollective,  yum JIRA Lemon  / Hadoop git OpenStack   Nova Hardware   database Puppet Active  Directory  / LDAP
  • 12. Choose  Puppet  for  Configuration • The  tool  space  has  exploded  in  last  few  years – In  configuration  management  and  operations • Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’ • Many  large  enterprises  now  use  Puppet – Its  declarative  approach  fits  what  we’re  used  to  at  CERN – Large  installations:  friendly,  wide-­‐based  community – You  can  buy  books  on  it – You  can  employ  people  who  know  it  better  than  do Gavin  McCance,  CERN 12
  • 13. Puppet  Experience • Excellent:  basic  puppet  is  easy  to  setup and  can  be  scaled-­‐up  well • Well  documented,  configuring  services  with  it  is  easy • Handle  our  cluster  diversity  and  dynamic  clouds  well • Lots  of  resource  (“modules”)  online,  though  of  varying  quality • Large,  responsive  community  to  help • Lots  of  nice  tooling  for  free – Configuration  version  control  and  branching:  integrates  well  with  git – Dashboard:  we  use  the  Foreman dashboard • We’re  moving  all  our  production  service  over  in  2013 Gavin  McCance,  CERN 13
  • 15. Preparing  the  move  to  cloud • Improve  operational  efficiency  and  dynamicness – Dynamic  multiple  operating  system  demand – Dynamic  temporary  load  spikes  for  special  activities – Hardware  interventions  with  long  running  programs  (live  migration) • Improve  resource  efficiency – Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O – Highly  variable  load  such  as  interactive  or  build  machines • Enable  cloud  architectures – Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and   workflows • Improve  responsiveness – Self-­‐Service  with  coffee  break  response  time Gavin  McCance,  CERN 15
  • 16. What  is  OpenStack  ? • OpenStack  is  a  cloud  operating  system  that  controls  large   pools  of  compute,  storage,  and  networking  resources   throughout  a  datacenter,  all  managed  through  a  dashboard   that  gives  administrators  control  while  empowering  their  users   to  provision  resources  through  a  web  interface Gavin  McCance,  CERN 16
  • 17. Service  Model Gavin  McCance,  CERN 17 • Pets  are  given  names  like   pussinboots.cern.ch   • They  are  unique,  lovingly  hand  raised   and  cared  for • When  they  get  ill,  you  nurse  them  back   to  health • Cattle  are  given  numbers  like   vm0042.cern.ch • They  are  almost  identical  to  other  cattle • When  they  get  ill,  you  get  another  one • Future  application  architectures  should  use  Cattle  but  Pets  with   strong  configuration  management  are  viable  and  still  needed Borrowed  from @randybias at  Cloudscaling http://www.slideshare.net/randybias/the-­‐cloud-­‐ revolution-­‐cyber-­‐press-­‐forum-­‐philippines
  • 18. Basic  Openstack Components Gavin  McCance,  CERN 18 Compute Scheduler NetworkVolume Registry Image KEYSTONE HORIZON NOVAGLANCE • Each  component  has  an  API  and  is  pluggable • Other  non-­‐core  projects  interact  with  these  components  
  • 19. Supporting  the  Pets  with  OpenStack • Network – Interfacing  with  legacy  site  DNS  and  IP  management – Ensuring  Kerberos  identity  before  VM  start • Puppet – Ease  use  of  configuration  management  tools  with  our  users – Exploit  mcollective  for  orchestration/delegation • External  Block  Storage – Currently  using  nova-­‐volume  with  Gluster backing  store • Live  migration  to  maximise  availability – KVM  live  migration  using  Gluster – KVM  and  Hyper-­‐V  block  migration Gavin  McCance,  CERN 19
  • 20. Current  Status  of  OpenStack  at  CERN • Working  on  an  Essex  code  base  from  the  EPEL  repository – Excellent  experience  with  the  Fedora  cloud-­‐sig  team – Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora • Components – Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V – Keystone  running  with  Active  Directory  and  Glance  for  Linux  and   Windows  images • Pre-­‐production  facility  with  around  200  Hypervisors,    with   2000  VMs  integrated  with  CERN  infrastructure – used  for  simulation  of  magnet  placement  using  LHC@Home and  batch   physics  programs Gavin  McCance,  CERN 20
  • 22. Next  Steps • Deploy  into  production  at  the  start  of  2013  with  Folsom  running   production  services  and  compute  on  top  of  OpenStack  IaaS • Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary • Exploit  new  functionality – Ceilometer  for  metering – Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers – X.509  user  certificate  authentication – Load  balancing  as  a  service Ramping  to  15K  hypervisors  with  100K   VMs  by  2015   Gavin  McCance,  CERN 22
  • 23. Conclusions • CERN  computer  centre  is  expanding • We’re  in  the  process  of  refurbishing  the  tools  we  use   to  manage  the  centre  based  on  Openstack for  IaaS and  Puppet for  configuration  management • Production  at  CERN  in  next  few  months  on  Folsom – Gradual  migration  of  all  our  services • Community  is  key  to  shared  success – CERN  contributes  and  benefits Gavin  McCance,  CERN 23
  • 25. Training  and  Support • Buy  the  book  rather  than  guru  mentoring • Follow  the  mailing  lists  to  learn • Newcomers  are  rapidly  productive  (and  often  know  more  than  us) • Community  and  Enterprise  support  means  we’re  not  on  our  own Gavin  McCance,  CERN 25
  • 26. Staff  Motivation • Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts   end Gavin  McCance,  CERN 26
  • 27. When  communities  combine… • OpenStack’s  many  components  and  options  make   configuration  complex  out  of  the  box • Puppet  forge module  from  PuppetLabs  does  our  configuration • The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a   configured  machine  in  15  minutes Gavin  McCance,  CERN 27
  • 28. Foreman  to  manage  Puppetized VM Gavin  McCance,  CERN 28
  • 29. Active  Directory  Integration • CERN’s  Active  Directory – Unified  identity  management  across  the  site – 44,000  users – 29,000  groups – 200  arrivals/departures  per  month • Full  integration  with  Active  Directory  via  LDAP – Uses  the  OpenLDAP backend  with  some  particular  configuration   settings – Aim  for  minimal  changes  to  Active  Directory – 7  patches  submitted  around  hard  coded  values  and  additional  filtering • Now  in  use  in  our  pre-­‐production  instance – Map  project  roles  (admins,  members)  to  groups – Documentation  in  the  OpenStack  wiki Gavin  McCance,  CERN 29
  • 30. What  are  we  missing  (or  haven’t  found  yet)  ? • Best  practice  for – Monitoring  and  KPIs  as  part  of  core  functionality – Guest  disaster  recovery – Migration  between  versions  of  OpenStack • Roles  within  multi-­‐user  projects – VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete) – Project  admins  allowed  to  manage  all  resources – Other  members  should  not  have  high  rights  over  other  members  VMs • Global  quota  management  for  non-­‐elastic  private  cloud – Manage  resource  prioritisation  and  allocation  centrally – Capacity  management  /  utilisation  for  planning Gavin  McCance,  CERN 30
  • 31. Opportunistic  Clouds  in  online  experiment  farms • The  CERN  experiments  have  farms  of  1000s  of  Linux  servers   close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s   to  be  recorded  to  tape • When  the  accelerator  is  not  running,  these  machines  are   currently    idle – Accelerator  has  regular  maintenance  slots  of  several  days – Long  Shutdown  due  from  March  2013-­‐November  2014 • One  of  the  experiments  are  deploying  OpenStack  on  their  farm – Simulation  (low  I/O,  high  CPU) – Analysis  (high  I/O,  high  CPU,  high  network) Gavin  McCance,  CERN 31
  • 32. New  architecture  data  flows Gavin  McCance,  CERN 32

Notes de l'éditeur

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  4. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  5. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  6. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  7. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  8. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  9. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  10. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  11. Complex to configure… take advantage of the experience of others
  12. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  13. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  14. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…