SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
1	
  
Hello	
  everyone,	
  my	
  name	
  is	
  Alon	
  Horev,	
  I’m	
  based	
  in	
  Israel	
  and	
  I’m	
  working	
  at	
  
intucell	
  which	
  was	
  acquired	
  by	
  Cisco.	
  	
  
I’m	
  a	
  python	
  developer	
  and	
  I	
  lead	
  intucell’s	
  data	
  team.	
  About	
  two	
  years	
  ago	
  we	
  
migrated	
  our	
  product	
  off	
  MySQL	
  and	
  started	
  working	
  with	
  MongoDB.	
  
	
  
I	
  want	
  to	
  start	
  off	
  by	
  introducing	
  our	
  use	
  case	
  of	
  MongoDB:	
  
We’ve	
  built	
  a	
  system	
  that	
  opJmizes	
  cellular	
  networks	
  automaJcally.	
  OpJmizing	
  
cellular	
  networks	
  is	
  about	
  making	
  your	
  data	
  connecJon	
  faster	
  and	
  improve	
  the	
  
quality	
  of	
  your	
  calls.	
  
2	
  
The	
  way	
  we	
  do	
  this	
  is	
  preOy	
  simple,	
  we	
  collect	
  a	
  lot	
  of	
  staJsJcs	
  about	
  what	
  goes	
  on	
  
in	
  the	
  network,	
  like	
  how	
  many	
  calls	
  are	
  taking	
  place	
  or	
  how	
  many	
  users	
  are	
  
connected	
  to	
  the	
  antenna.	
  	
  
We	
  then	
  analyze	
  this	
  informaJon	
  to	
  idenJfy	
  things	
  like	
  which	
  antennas	
  are	
  loaded.	
  
Once	
  we	
  know	
  what	
  are	
  the	
  problems	
  in	
  the	
  network	
  we	
  act,	
  we	
  change	
  parameters	
  
in	
  the	
  network	
  ,	
  for	
  example,	
  we	
  would	
  force	
  your	
  phone	
  to	
  use	
  a	
  different	
  antenna	
  
so	
  you’ll	
  get	
  a	
  beOer	
  service.	
  
Now,	
  as	
  you	
  see	
  this	
  process	
  is	
  cyclic,	
  we’ll	
  collect	
  more	
  staJsJcs	
  to	
  make	
  further	
  
changes	
  and	
  make	
  sure	
  we	
  improved	
  the	
  network.	
  This	
  happens	
  all	
  the	
  Jme,	
  even	
  
here	
  right	
  now,	
  with	
  AT&T.	
  
	
  
In	
  the	
  process	
  of	
  working	
  with	
  MongoDB	
  we	
  learned	
  a	
  lot	
  about	
  database	
  
performance	
  and	
  server	
  performance.	
  I	
  personally	
  spent	
  a	
  lot	
  of	
  Jme	
  monitoring	
  and	
  
opJmizing	
  the	
  storage	
  and	
  memory	
  usage	
  which	
  brings	
  me	
  to	
  this	
  lecture.	
  
	
  
3	
  
Today	
  I’m	
  going	
  to	
  try	
  and	
  give	
  you	
  an	
  understanding	
  of	
  how	
  MongoDB	
  manages	
  
memory.	
  	
  
	
  
So,	
  first,	
  what	
  is	
  'memory	
  management'	
  when	
  it	
  comes	
  to	
  MongoDB?	
  	
  
Well,	
  memory	
  is	
  a	
  fast	
  but	
  limited	
  and	
  expensive	
  resource,	
  memory	
  management	
  is	
  
about	
  deciding	
  what	
  data	
  to	
  save	
  in	
  memory.	
  
	
  
4	
  
Why	
  should	
  you	
  care	
  about	
  memory	
  management?	
  	
  
memory	
  management	
  has	
  a	
  huge	
  impact	
  on	
  performance	
  and	
  costs.	
  
This	
  relates	
  both	
  to	
  developers	
  and	
  dbas,	
  as	
  a	
  developer	
  you	
  can	
  opJmize	
  the	
  
schema	
  and	
  queries	
  for	
  beOer	
  memory	
  usage,	
  
As	
  a	
  dba	
  you	
  can	
  monitor	
  and	
  predict	
  performance	
  issues	
  related	
  to	
  memory	
  usage.	
  
I’m	
  preOy	
  sure	
  every	
  mongodb	
  administrator	
  asked	
  himself	
  atleast	
  once:	
  how	
  much	
  
memory	
  do	
  I	
  really	
  need?.	
  
	
  
Before	
  we	
  dive	
  in	
  I	
  want	
  to	
  tell	
  you	
  a	
  liOle	
  secret:	
  MongoDB	
  doesn’t	
  actually	
  manage	
  
memory.	
  It	
  leaves	
  that	
  responsibility	
  to	
  the	
  operaJng	
  system.	
  	
  
5	
  
Within	
  the	
  operaJng	
  system	
  there’s	
  a	
  stack	
  of	
  components	
  which	
  MongoDB	
  depends	
  
on	
  to	
  manage	
  memory.	
  
Each	
  component	
  relies	
  on	
  the	
  component	
  below	
  it.	
  	
  
(!)	
  
This	
  talk	
  is	
  structured	
  around	
  this	
  stack	
  of	
  components.	
  
We’ll	
  start	
  from	
  the	
  low	
  level	
  components	
  which	
  are	
  storage	
  devices:	
  disks	
  and	
  RAM	
  
We’ll	
  conJnue	
  with	
  the	
  page	
  cache	
  and	
  memory	
  mapped	
  files	
  which	
  are	
  a	
  part	
  of	
  the	
  
operaJng	
  system’s	
  kernel	
  
And	
  we’ll	
  finish	
  off	
  with	
  MongoDB’s	
  usage	
  of	
  these	
  mechanisms.	
  	
  
	
  
(!)	
  
	
  
Let’s	
  talk	
  about	
  storage.	
  	
  
	
  
	
  
6	
  
There	
  are	
  different	
  types	
  of	
  storage	
  devices	
  with	
  different	
  characterisJcs,	
  we’ll	
  
review	
  hard	
  disk	
  drives,	
  solid	
  state	
  drives	
  and	
  RAM.	
  
	
  
Let’s	
  start	
  by	
  breaking	
  these	
  into	
  categories:	
  (!)	
  HDDs	
  and	
  SSDs	
  are	
  persistent	
  and	
  
RAM	
  isn’t,	
  but	
  RAM	
  is	
  really	
  fast.	
  	
  
That’s	
  why	
  every	
  computer	
  has	
  both	
  types	
  of	
  storage,	
  one	
  persistent	
  (a	
  HDD	
  or	
  a	
  
SSD)	
  and	
  one	
  is	
  volaJle	
  (RAM).	
  
	
  
7	
  
Now	
  let’s	
  compare	
  throughput.	
  As	
  I	
  said	
  before,	
  RAM	
  is	
  fast,	
  it	
  could	
  go	
  as	
  fast	
  as	
  
6400	
  MBPS	
  for	
  reads	
  and	
  writes.	
  	
  
SSDs	
  are	
  10	
  Jmes	
  slower	
  than	
  RAM,	
  modern	
  SSDs	
  can	
  reach	
  a	
  read	
  rate	
  of	
  650	
  MBPS	
  
and	
  a	
  liOle	
  less	
  for	
  writes.	
  
HDDs	
  are	
  much	
  slower,	
  ranging	
  from	
  1	
  MB	
  to	
  160	
  MB	
  per	
  second	
  for	
  reads	
  and	
  
writes.	
  	
  
	
  
The	
  reason	
  there’s	
  such	
  variance	
  in	
  HDD	
  speed	
  is	
  because	
  throughput	
  is	
  highly	
  
affected	
  by	
  access	
  paOerns.	
  
Specifically	
  with	
  HDDs,	
  random	
  access	
  is	
  much	
  slower	
  than	
  sequenJal	
  access,	
  and	
  
that’s	
  because	
  a	
  HDD	
  contains	
  a	
  mechanical	
  arm	
  that	
  needs	
  to	
  move	
  on	
  almost	
  every	
  
random	
  access.	
  
Sadly	
  for	
  us,	
  databases	
  do	
  a	
  lot	
  of	
  random	
  I/O.	
  which	
  means,	
  if	
  you’re	
  running	
  a	
  
query	
  on	
  data	
  that’s	
  not	
  in	
  memory	
  and	
  therefore,	
  it	
  has	
  to	
  be	
  read	
  from	
  disk,	
  you’re	
  
seeing	
  a	
  penalty	
  of	
  about	
  two	
  mulJtudes	
  on	
  response	
  Jmes.	
  	
  
	
  
The	
  next	
  characterisJc	
  is	
  price.	
  (!)	
  
For	
  making	
  the	
  comparison	
  easier	
  we’ll	
  compare	
  the	
  price	
  per	
  GB.	
  It’s	
  not	
  surprising	
  
that	
  there’s	
  a	
  correlaJon	
  between	
  price	
  and	
  throughput,	
  meaning,	
  the	
  more	
  you	
  pay	
  
for	
  each	
  GB,	
  you	
  get	
  beOer	
  throughput.	
  So	
  hard	
  drives	
  are	
  really	
  cheap	
  at	
  5	
  cents	
  per	
  
GB,	
  SSDs	
  are	
  10	
  Jmes	
  more	
  expensive	
  and	
  RAM	
  is	
  100	
  Jmes	
  more	
  expensive.	
  
8	
  
Is	
  this	
  informaJon	
  sufficient	
  to	
  choose	
  the	
  opJmal	
  hardware	
  configuraJon?	
  I	
  think	
  
it’s	
  not,	
  your	
  applicaJon’s	
  requirements	
  are	
  also	
  a	
  part	
  of	
  the	
  equaJon.	
  
For	
  example,	
  if	
  your	
  applicaJon	
  is	
  an	
  archive	
  that	
  saves	
  huge	
  amounts	
  of	
  data	
  that	
  is	
  
rarely	
  accessed,	
  you	
  can	
  go	
  for	
  a	
  large	
  HDD	
  and	
  save	
  a	
  lot	
  of	
  money.	
  
Later	
  on	
  we’ll	
  see	
  how	
  can	
  you	
  take	
  measurements	
  of	
  things	
  like	
  RAM	
  and	
  capacity	
  
and	
  then	
  you’ll	
  be	
  able	
  to	
  determine	
  what	
  kind	
  of	
  hardware	
  configuraJon	
  you	
  need.	
  
9	
  
Now	
  lets	
  zoom	
  out	
  of	
  storage	
  and	
  and	
  move	
  up	
  to	
  the	
  next	
  layer	
  which	
  is	
  the	
  page	
  
cache.	
  	
  
	
  
10	
  
The	
  page	
  cache	
  is	
  a	
  part	
  of	
  the	
  operaJng	
  system’s	
  kernel	
  and	
  whenever	
  a	
  program	
  
does	
  file	
  I/O	
  like	
  reads	
  and	
  writes	
  it	
  always	
  goes	
  through	
  the	
  page	
  cache.	
  
The	
  page	
  cache	
  makes	
  reads	
  faster	
  by	
  saving	
  popular	
  chunks	
  of	
  data	
  in	
  memory	
  and	
  
makes	
  writes	
  faster	
  by	
  lehng	
  the	
  applicaJon	
  write	
  to	
  memory	
  and	
  not	
  to	
  disk.	
  
So	
  we	
  can	
  say	
  the	
  page	
  cache	
  was	
  invented	
  to	
  combine	
  the	
  disk’s	
  persistence	
  with	
  
the	
  memory’s	
  speed.	
  It’s	
  about	
  having	
  the	
  best	
  of	
  both	
  worlds.	
  
	
  
11	
  
So..	
  It’s	
  called	
  the	
  page	
  cache	
  but	
  what	
  is	
  a	
  page?	
  
	
  
A	
  page	
  is	
  a	
  4K	
  chunk	
  of	
  data.	
  Each	
  file	
  is	
  broken	
  into	
  pages.	
  The	
  number	
  of	
  pages	
  
belong	
  to	
  a	
  file	
  is	
  simply	
  the	
  file’s	
  size	
  divided	
  by	
  4K.	
  	
  
(!)	
  
Looking	
  at	
  the	
  example,	
  you	
  can	
  see	
  a	
  file	
  spanning	
  3	
  pages	
  because	
  it’s	
  10	
  kilobytes	
  
in	
  size,	
  that	
  grey	
  area	
  is	
  an	
  unused	
  part	
  of	
  the	
  last	
  page	
  as	
  the	
  file’s	
  size	
  isn’t	
  a	
  
mulJple	
  of	
  4	
  kilobytes.	
  
	
  
The	
  page	
  cache’s	
  job	
  is	
  to	
  determine	
  which	
  pages	
  to	
  save	
  in	
  memory.	
  	
  
12	
  
Lets	
  dive	
  a	
  liOle	
  deeper	
  and	
  see	
  what	
  happens	
  behind	
  the	
  scenes	
  when	
  we	
  read	
  from	
  
a	
  file.	
  
(!)	
  
We	
  have	
  a	
  process	
  running	
  in	
  user	
  space	
  and	
  it’s	
  reading	
  100	
  bytes	
  from	
  a	
  file.	
  
(!)	
  
Through	
  a	
  system	
  call	
  we	
  get	
  to	
  the	
  kernel	
  where	
  the	
  page	
  cache	
  handles	
  the	
  read	
  
request.	
  	
  
(!)	
  
First,	
  the	
  page	
  cache	
  translates	
  the	
  posiJon	
  and	
  count	
  of	
  bytes	
  to	
  read	
  to	
  a	
  list	
  of	
  
pages.	
  If	
  we	
  would	
  read	
  a	
  100	
  bytes	
  from	
  the	
  beginning	
  of	
  the	
  file,	
  the	
  result	
  of	
  this	
  
step	
  would	
  be	
  the	
  first	
  page.	
  
(!)	
  
The	
  next	
  thing	
  the	
  page	
  cache	
  will	
  do	
  is	
  check	
  if	
  the	
  page	
  exists	
  in	
  the	
  cache,	
  (!)	
  if	
  it’s	
  
not,	
  the	
  data	
  has	
  to	
  be	
  read	
  from	
  disk	
  and	
  then	
  it	
  will	
  be	
  stored	
  in	
  the	
  cache.	
  
Once	
  the	
  page	
  is	
  in	
  the	
  cache	
  we	
  reach	
  the	
  last	
  step,	
  (!)	
  which	
  is	
  to	
  copy	
  the	
  data	
  to	
  
the	
  user	
  space	
  applicaJon.	
  
	
  
So	
  that’s	
  how	
  a	
  read	
  works.	
  
	
  
13	
  
The	
  page	
  cache	
  also	
  handles	
  writes.	
  	
  
(!)	
  
This	
  Jme	
  our	
  process	
  is	
  calling	
  the	
  write	
  system	
  call.	
  
(!)	
  
The	
  page	
  cache	
  copies	
  the	
  data	
  from	
  the	
  process	
  to	
  the	
  relevant	
  pages	
  and	
  marks	
  
them	
  as	
  dirty.	
  That’s	
  all	
  it	
  does,	
  change	
  data	
  in	
  memory.	
It	
  gives	
  the	
  impression	
  the	
  data	
  has	
  been	
  wriOen,	
  where	
  in	
  fact	
  it	
  has	
  been	
  wriOen	
  
only	
  to	
  memory	
  and	
  not	
  to	
  disk.	
  If	
  an	
  applicaJon	
  would	
  read	
  from	
  the	
  file	
  it	
  would	
  
get	
  the	
  latest	
  the	
  data	
  from	
  memory	
  because	
  dirty	
  pages	
  must	
  stay	
  in	
  the	
  cache.	
  	
  
	
  
Having	
  dirty	
  pages	
  is	
  somewhat	
  dangerous	
  for	
  two	
  reasons:	
  first,	
  they	
  will	
  be	
  lost	
  if	
  
the	
  operaJng	
  system	
  crashes.	
  Second,	
  if	
  there’s	
  a	
  lack	
  of	
  memory	
  they	
  can’t	
  be	
  freed.	
  
The	
  soluJon	
  for	
  these	
  problems	
  is	
  to	
  flush	
  the	
  dirty	
  pages	
  to	
  the	
  disk.	
  (!)	
  There’s	
  a	
  
thread	
  in	
  the	
  kernel	
  that	
  flushes	
  pages	
  aler	
  they	
  stay	
  in	
  the	
  cache	
  for	
  some	
  Jme	
  or	
  
when	
  memory	
  needs	
  to	
  be	
  freed.	
  
	
  
If	
  a	
  process	
  wants	
  to	
  make	
  sure	
  the	
  data	
  is	
  flushed	
  to	
  disk	
  it	
  can	
  call	
  the	
  fsync	
  system	
  
call	
  that	
  can	
  trigger	
  a	
  flush	
  for	
  a	
  specific	
  file	
  or	
  even	
  the	
  enJre	
  file	
  system.	
  	
  
MongoDB	
  calls	
  that	
  every	
  30	
  seconds	
  to	
  make	
  sure	
  data	
  is	
  backed	
  by	
  disk.	
  
	
  
	
  
	
  
	
  
	
  
14	
  
I	
  menJoned	
  how	
  the	
  page	
  cache	
  frees	
  pages	
  when	
  memory	
  is	
  running	
  low,	
  this	
  
procedure	
  is	
  called	
  page	
  reclamaJon.	
  
	
  
There	
  are	
  different	
  page	
  reclamaJon	
  policies.	
  A	
  page	
  reclamaJon	
  policy	
  is	
  an	
  
algorithm	
  that	
  answers	
  a	
  simple	
  quesJon:	
  “what’s	
  the	
  next	
  page	
  that	
  can	
  be	
  freed?”	
  
In	
  linux,	
  the	
  simple	
  answer	
  is:	
  “The	
  one	
  that	
  is	
  the	
  least	
  recently	
  used”.	
  
	
  
Turns	
  out	
  page	
  reclamaJon	
  is	
  happening	
  all	
  the	
  Jme	
  even	
  on	
  healthy	
  systems,	
  it	
  
doesn’t	
  mean	
  you’re	
  out	
  of	
  memory.	
  	
  
That’s	
  because	
  the	
  page	
  cache	
  is	
  greedy	
  and	
  will	
  try	
  to	
  use	
  all	
  the	
  free	
  memory	
  on	
  
your	
  machine	
  to	
  cache	
  the	
  file	
  system.	
  
	
  
In	
  order	
  to	
  understand	
  how	
  much	
  memory	
  is	
  used	
  by	
  the	
  page	
  cache	
  you	
  can	
  use	
  the	
  
free	
  command.	
  
15	
  
Free	
  is	
  a	
  linux	
  program	
  that	
  displays	
  memory	
  usage	
  staJsJcs.	
  Lets	
  try	
  to	
  interpret	
  its	
  
output.	
  
When	
  running	
  free	
  with	
  –g	
  it	
  prints	
  units	
  in	
  GBs.	
  The	
  first	
  line	
  reveals	
  the	
  total	
  
amount	
  of	
  memory	
  which	
  is	
  64GB,	
  out	
  of	
  these	
  61GB	
  are	
  used	
  and	
  3GB	
  are	
  free.	
  
Then,	
  out	
  of	
  the	
  61GB	
  that	
  are	
  used,	
  55GB	
  are	
  of	
  of	
  cached	
  data.	
  These	
  are	
  pages	
  in	
  
the	
  page	
  cache.	
  	
  
The	
  second	
  line	
  interprets	
  the	
  cached	
  data	
  as	
  free	
  so	
  we	
  suddenly	
  have	
  only	
  5GB	
  of	
  
used	
  memory.	
  This	
  is	
  memory	
  directly	
  allocated	
  by	
  programs.	
  
The	
  reason	
  cached	
  memory	
  can	
  be	
  considered	
  free	
  is	
  because	
  even	
  though	
  the	
  
memory	
  is	
  used	
  it	
  will	
  be	
  freed	
  if	
  programs	
  need	
  it.	
  	
  
As	
  soon	
  as	
  programs	
  allocate	
  memory	
  and	
  the	
  free	
  memory	
  runs	
  out	
  the	
  page	
  cache	
  
shrinks	
  and	
  frees	
  pages.	
  
	
  
16	
  
The	
  next	
  component	
  up	
  the	
  stack	
  is	
  memory	
  mapped	
  files.	
  
	
  
17	
  
Memory	
  mapping	
  of	
  files	
  is	
  an	
  alternaJve	
  mechanism	
  for	
  reading	
  and	
  wriJng	
  from	
  
files.	
  Instead	
  of	
  calling	
  the	
  read()	
  and	
  write()	
  system	
  calls,	
  a	
  process	
  can	
  map	
  a	
  part	
  of	
  
file	
  into	
  memory	
  and	
  every	
  access	
  the	
  process	
  makes	
  to	
  memory	
  translates	
  to	
  a	
  file	
  
read	
  or	
  write.	
  
	
  
On	
  the	
  lel	
  you	
  can	
  see	
  a	
  process	
  with	
  a	
  memory	
  region	
  which	
  is	
  mapped	
  to	
  a	
  
segment	
  of	
  a	
  file.	
  	
  
So	
  memory	
  addresses	
  100	
  to	
  200	
  are	
  mapped	
  to	
  a	
  file	
  segment	
  that	
  starts	
  at	
  400	
  and	
  
ends	
  at	
  500.	
  
A	
  write	
  to	
  memory	
  address	
  100	
  is	
  translated	
  to	
  a	
  write	
  to	
  the	
  file	
  at	
  address	
  400.	
  
	
  
Mapping	
  a	
  file	
  into	
  memory	
  doesn’t	
  necessarily	
  load	
  its	
  data	
  into	
  memory,	
  if	
  a	
  
process	
  reads	
  from	
  a	
  page	
  that	
  is	
  not	
  in	
  memory	
  the	
  infamous	
  page	
  fault	
  is	
  triggered.	
  
The	
  code	
  in	
  the	
  kernel	
  that	
  handles	
  page	
  faults	
  tells	
  the	
  page	
  cache	
  to	
  load	
  the	
  
required	
  pieces	
  of	
  data	
  from	
  disk	
  and	
  then	
  serves	
  the	
  read.	
  
	
  
So	
  memory	
  mapping	
  has	
  several	
  advantages	
  over	
  regular	
  file	
  I/O:	
  
First,	
  it’s	
  fast,	
  there’s	
  no	
  system	
  call	
  involved	
  and	
  no	
  copying	
  of	
  memory.	
  Reads	
  and	
  
writes	
  access	
  memory	
  that	
  is	
  allocated	
  in	
  the	
  page	
  cache.	
  
Second,	
  it	
  takes	
  the	
  responsibility	
  of	
  memory	
  management	
  from	
  the	
  user.	
  As	
  we’ve	
  
seen	
  earlier,	
  the	
  page	
  cache	
  will	
  determine	
  what’s	
  actually	
  stored	
  in	
  memory.	
  
18	
  
In	
  this	
  example	
  two	
  processes	
  map	
  the	
  same	
  region	
  of	
  a	
  file	
  into	
  memory.	
  Only	
  one	
  
copy	
  of	
  this	
  data	
  will	
  occupy	
  memory	
  or	
  even	
  less	
  if	
  it’s	
  not	
  accessed.	
  
Historically	
  this	
  mechanism	
  was	
  invented	
  to	
  reduce	
  the	
  memory	
  usage	
  of	
  processes.	
  
Whenever	
  you	
  execute	
  a	
  program,	
  the	
  program’s	
  code	
  and	
  it’s	
  shared	
  libraries	
  are	
  
mapped	
  to	
  memory.	
  	
  
So	
  if	
  you	
  open	
  10	
  instances	
  of	
  chrome,	
  its	
  code	
  sJll	
  appears	
  once	
  in	
  memory.	
  
	
  
19	
  
Now	
  lets	
  see	
  how	
  Mongo	
  uses	
  this	
  stack	
  of	
  components	
  
20	
  
(!)	
  
Mongo	
  maps	
  all	
  it’s	
  data	
  into	
  memory.	
  This	
  includes	
  the	
  documents,	
  the	
  indexes	
  and	
  
the	
  journal.	
  
(!)	
  
When	
  running	
  top	
  you	
  can	
  actually	
  see	
  how	
  much	
  memory	
  is	
  mapped	
  and	
  how	
  much	
  
is	
  used.	
  
(!)	
  
The	
  lel	
  column	
  called	
  VIRT	
  stands	
  for	
  virtual	
  memory,	
  once	
  a	
  process	
  maps	
  files	
  to	
  
memory	
  they’re	
  accounted	
  under	
  virtual	
  memory.	
  
When	
  using	
  journaling	
  mongo	
  actually	
  maps	
  the	
  data	
  files	
  twice,	
  so	
  this	
  figure	
  is	
  
twice	
  the	
  amount	
  on	
  disk	
  which	
  is	
  about	
  273GB.	
  
RES	
  stands	
  for	
  resident	
  memory	
  and	
  is	
  the	
  amount	
  of	
  memory	
  that’s	
  actually	
  located	
  
in	
  RAM	
  out	
  the	
  virtual	
  memory.	
  
SHR	
  stands	
  for	
  shared	
  resident	
  memory.	
  So	
  out	
  of	
  the	
  24GB	
  of	
  resident	
  memory,	
  
23GB	
  is	
  data	
  from	
  memory	
  mapped	
  files	
  which	
  is	
  sharable.	
  
	
  
	
  
21	
  
Turns	
  out	
  this	
  very	
  cool	
  strategy	
  for	
  managing	
  memory	
  also	
  has	
  problems.	
  The	
  
biggest	
  problem	
  is	
  MongoDB	
  (!)	
  has	
  no	
  control	
  of	
  what	
  is	
  saved	
  in	
  memory.	
  You	
  can’t	
  
tell	
  mongo:	
  promise	
  me	
  this	
  document	
  or	
  collecJon	
  is	
  stored	
  in	
  memory	
  and	
  by	
  that	
  
ensuring	
  fast	
  access.	
  
	
  
Why	
  is	
  this	
  a	
  problem?	
  I’ll	
  give	
  you	
  some	
  examples:	
  
1.  (!)	
  The	
  first	
  example	
  is	
  warm-­‐up	
  –	
  aler	
  restarJng	
  your	
  server,	
  none	
  of	
  the	
  data	
  is	
  
stored	
  in	
  memory,	
  for	
  every	
  page	
  that	
  is	
  accessed	
  for	
  the	
  first	
  Jme,	
  a	
  page	
  fault	
  
will	
  be	
  triggered	
  and	
  the	
  query	
  will	
  take	
  longer.	
  
2.  (!)	
  The	
  second	
  example	
  is	
  what	
  I	
  call	
  expensive	
  queries	
  –	
  expensive	
  queries	
  are	
  
queries	
  that	
  aren’t	
  indexed	
  well	
  or	
  request	
  data	
  that	
  is	
  hardly	
  ever	
  accessed.	
  
When	
  these	
  things	
  happen	
  documents	
  are	
  loaded	
  into	
  memory	
  at	
  the	
  cost	
  of	
  
freeing	
  other	
  documents	
  who	
  are	
  more	
  important.	
  Why	
  does	
  this	
  happen?	
  As	
  
we’ve	
  seen	
  before	
  the	
  page	
  cache	
  frees	
  the	
  least	
  recently	
  used	
  pages	
  first.	
  	
  
	
  
There	
  are	
  things	
  you	
  can	
  do	
  to	
  miJgate	
  this	
  problem.	
  
22	
  
What	
  we	
  did	
  is	
  (!)	
  protect	
  MongoDB	
  with	
  an	
  API.	
  The	
  API	
  enforces	
  index	
  usage	
  so	
  
mongo	
  reads	
  less	
  documents	
  into	
  memory.	
  Another	
  thing	
  the	
  API	
  does	
  is	
  pass	
  a	
  
query	
  Jmeout	
  to	
  make	
  sure	
  costly	
  queries	
  are	
  being	
  cancelled.	
  
	
  
The	
  API	
  doesn’t	
  have	
  to	
  be	
  complicated,	
  it	
  could	
  be	
  a	
  simple	
  module	
  sihng	
  on	
  top	
  of	
  
the	
  MongoDB	
  driver.	
  	
  
Lets	
  look	
  at	
  an	
  example,	
  (!)	
  this	
  is	
  (!)	
  a	
  python	
  funcJon	
  called	
  find_samples	
  and	
  it’s	
  
used	
  whenever	
  we	
  want	
  to	
  run	
  a	
  find	
  query	
  on	
  the	
  collecJon	
  named	
  samples.	
  
The	
  funcJon	
  accepts	
  two	
  parameters	
  that	
  define	
  a	
  date	
  range:	
  start_Jme	
  and	
  
end_Jme.	
  By	
  forcing	
  the	
  user	
  to	
  pass	
  a	
  date	
  range	
  we	
  make	
  sure	
  the	
  query	
  is	
  
indexed.	
  You	
  could	
  add	
  further	
  validaJons	
  to	
  make	
  sure	
  the	
  range	
  isn’t	
  too	
  big	
  or	
  
doesn’t	
  go	
  too	
  far	
  back	
  in	
  history.	
  
23	
  
Another	
  challenge	
  worth	
  menJoning	
  is	
  (!)	
  the	
  lack	
  of	
  prioriJzaJon	
  between	
  
processes.	
  When	
  processes	
  allocate	
  a	
  lot	
  of	
  memory	
  the	
  page	
  cache	
  shrinks	
  
automaJcally,	
  and	
  since	
  mongo	
  relies	
  on	
  the	
  page	
  cache,	
  you	
  could	
  say	
  mongo’s	
  
memory	
  shrinks	
  automaJcally.	
  In	
  other	
  words,	
  mongo	
  has	
  a	
  lower	
  priority	
  than	
  other	
  
processes	
  over	
  memory.	
  Since	
  mongo	
  will	
  just	
  become	
  slower	
  if	
  it	
  doesn’t	
  have	
  
enough	
  memory	
  you	
  need	
  to	
  be	
  careful	
  with	
  other	
  processes	
  running	
  on	
  the	
  same	
  
server.	
  
	
  
You	
  can	
  miJgate	
  this	
  phenomenon	
  by	
  isolaJng	
  mongo.	
  (!)	
  Don’t	
  run	
  it	
  on	
  the	
  same	
  
server	
  along	
  with	
  memory	
  or	
  disk	
  intensive	
  applicaJons.	
  
	
  
The	
  last	
  challenge	
  I’d	
  like	
  to	
  tackle	
  is	
  (!)	
  esJmaJng	
  how	
  much	
  memory	
  is	
  required,	
  
also	
  known	
  as	
  the	
  size	
  of	
  the	
  working	
  set.	
  
	
  
	
  
	
  
24	
  
So	
  what	
  is	
  the	
  working	
  set?	
  this	
  is	
  the	
  data	
  that	
  your	
  applicaJon	
  reads	
  regularly	
  and	
  
should	
  be	
  returned	
  in	
  a	
  Jmely	
  manner,	
  therefore	
  it	
  should	
  fit	
  in	
  memory.	
  	
  
The	
  working	
  set	
  contains	
  (!)	
  more	
  than	
  documents,	
  it	
  also	
  includes	
  indexes	
  and	
  some	
  
padding.	
  
To	
  emphasize	
  the	
  padding	
  issue	
  lets	
  look	
  at	
  an	
  example	
  memory	
  page.	
  
(!)	
  
As	
  I	
  menJoned	
  before,	
  a	
  page’s	
  size	
  is	
  4k.	
  	
  
This	
  page	
  includes	
  3	
  documents,	
  between	
  the	
  documents	
  there’s	
  some	
  padding.	
  This	
  
padding	
  accounts	
  for	
  expansion	
  of	
  exisJng	
  documents	
  or	
  inserJon	
  of	
  new	
  ones.	
  
Out	
  of	
  the	
  three	
  documents,	
  only	
  document	
  number	
  2	
  is	
  accessed	
  regularly.	
  	
  
So	
  even	
  though	
  a	
  small	
  part	
  of	
  this	
  page	
  is	
  actually	
  used,	
  the	
  whole	
  page	
  is	
  saved	
  in	
  
memory.	
  the	
  page	
  cache	
  can’t	
  save	
  half	
  pages	
  in	
  memory.	
  
	
  
This	
  brings	
  us	
  to	
  the	
  conclusion	
  that	
  it’s	
  really	
  hard	
  to	
  measure	
  the	
  size	
  of	
  the	
  
working	
  set	
  by	
  simply	
  looking	
  at	
  the	
  count	
  or	
  size	
  of	
  the	
  documents	
  being	
  queried.	
  
	
  
SJll,	
  there	
  are	
  several	
  tools	
  to	
  help	
  you	
  esJmate	
  how	
  much	
  memory	
  a	
  collecJon	
  
should	
  require.	
  
	
  
25	
  
The	
  tools	
  fall	
  into	
  two	
  categories:	
  planning	
  and	
  monitoring.	
  
	
  
26	
  
Planning	
  is	
  about	
  predicJng	
  how	
  much	
  memory	
  each	
  collecJon	
  is	
  going	
  to	
  need.	
  	
  
Lets	
  take	
  a	
  real	
  world	
  example.	
  In	
  one	
  of	
  our	
  collecJons	
  we	
  save	
  a	
  month	
  long	
  of	
  
history,	
  out	
  of	
  that	
  month	
  we	
  know	
  our	
  applicaJon	
  olen	
  queries	
  the	
  last	
  two	
  weeks	
  
and	
  someJmes	
  the	
  week	
  before	
  that.	
  The	
  last	
  two	
  weeks	
  are	
  considered	
  “hot	
  data”	
  
because	
  they	
  have	
  to	
  be	
  stored	
  in	
  memory,	
  the	
  week	
  before	
  that	
  is	
  considered	
  warm,	
  
it	
  doesn’t	
  have	
  to	
  be	
  in	
  memory	
  but	
  we	
  should	
  sJll	
  take	
  into	
  account	
  so	
  it	
  won’t	
  push	
  
out	
  the	
  hot	
  data.	
  
	
  
If	
  we’re	
  going	
  to	
  take	
  some	
  spares	
  to	
  compensate	
  for	
  padding	
  and	
  such,	
  it’s	
  safe	
  to	
  
assume	
  3	
  out	
  of	
  the	
  4	
  weeks	
  should	
  fit	
  in	
  memory.	
  
	
  
(!)	
  
You	
  can	
  use	
  the	
  collecJon	
  stats	
  command	
  to	
  get	
  important	
  metrics	
  like	
  the	
  size	
  of	
  
indexes	
  and	
  the	
  size	
  of	
  the	
  data	
  and	
  roughly	
  calculate	
  how	
  much	
  memory	
  the	
  
collecJon	
  is	
  going	
  to	
  require.	
  
	
  
Once	
  you	
  have	
  a	
  running	
  database	
  you	
  can	
  use	
  several	
  monitoring	
  tools	
  to	
  analyze	
  
the	
  working	
  set.	
  
27	
  
When	
  I	
  think	
  about	
  monitoring	
  tools	
  they	
  generally	
  fall	
  into	
  two	
  categories:	
1.  (!)	
  One	
  is	
  online	
  monitoring	
  which	
  is	
  basically	
  seeing	
  what’s	
  going	
  on	
  at	
  the	
  
moment.	
  This	
  category	
  includes	
  running	
  linux	
  commands	
  like	
  top	
  and	
  iostat	
  or	
  
mongo	
  commands	
  like	
  currentOp,	
  mongostat	
  and	
  mongomem.	
  
2.  (!)	
  The	
  second	
  category	
  is	
  offline	
  monitoring	
  which	
  is	
  more	
  about	
  collecJng	
  and	
  
aggregaJng	
  historical	
  data.	
  One	
  example	
  would	
  be	
  the	
  profiling	
  collecJons	
  that	
  
collects	
  slow	
  queries	
  over	
  Jme.	
  another	
  example	
  is	
  the	
  MMS	
  or	
  other	
  graphing	
  
tools	
  like	
  graphite	
  that	
  collect	
  different	
  metrics	
  over	
  Jme.	
  these	
  are	
  used	
  for	
  
idenJfying	
  trends,	
  correlaJons	
  and	
  predicJng	
  growth.	
  
Lets	
  start	
  from	
  the	
  online	
  tools.	
  
28	
  
Mongomem	
  is	
  a	
  great	
  tool	
  for	
  memory	
  use	
  analysis.	
  It’s	
  wriOen	
  in	
  python	
  by	
  the	
  
people	
  at	
  a	
  company	
  called	
  wish	
  so	
  you’ll	
  have	
  to	
  install	
  it	
  manually,	
  it	
  doesn’t	
  come	
  
packaged	
  with	
  mongodb.	
  
Mongomem	
  won’t	
  tell	
  you	
  how	
  much	
  memory	
  you	
  need	
  but	
  it	
  will	
  tell	
  you	
  how	
  much	
  
memory	
  each	
  collecJon	
  is	
  using	
  at	
  the	
  moment.	
  
Here’s	
  an	
  example	
  output,	
  (!)	
  each	
  line	
  shows	
  how	
  many	
  megabytes	
  of	
  the	
  collecJon	
  
are	
  in	
  memory.	
  The	
  top	
  collecJon	
  in	
  this	
  example	
  is	
  the	
  oplog	
  with	
  more	
  then	
  11GB	
  
of	
  data	
  in	
  memory	
  out	
  of	
  almost	
  50GB	
  of	
  data.	
  So	
  about	
  22%	
  of	
  the	
  collecJon	
  is	
  in	
  
memory.	
  
The	
  last	
  line	
  shows	
  the	
  total	
  amount	
  of	
  memory	
  used	
  by	
  mongo	
  out	
  of	
  the	
  total	
  data	
  
size,	
  so	
  in	
  this	
  example	
  we	
  have	
  16GB	
  of	
  data	
  in	
  memory	
  out	
  of	
  280GB	
  of	
  total	
  data.	
  
	
  
Since	
  I’ve	
  got	
  16GB	
  of	
  memory	
  on	
  this	
  machine,	
  we	
  can	
  see	
  all	
  the	
  memory	
  is	
  being	
  
used.	
  	
  
But	
  what	
  does	
  this	
  say	
  about	
  the	
  working	
  set?	
  Is	
  it	
  larger	
  than	
  memory?	
  In	
  other	
  
words,	
  do	
  we	
  have	
  enough	
  memory?	
  	
  
	
  
Well,	
  we	
  can’t	
  say,	
  because	
  it’s	
  possible	
  there’s	
  data	
  in	
  memory	
  that	
  is	
  hardly	
  ever	
  
accessed..	
  The	
  page	
  cache	
  just	
  didn’t	
  have	
  to	
  reclaim	
  these	
  pages.	
  
29	
  
What	
  you	
  can	
  do	
  in	
  order	
  to	
  test	
  how	
  much	
  RAM	
  mongo	
  actually	
  uses	
  is	
  the	
  following	
  
procedure:	
  
1.  First	
  thing	
  you	
  have	
  to	
  do	
  is	
  stop	
  the	
  database	
  
2.  Then,	
  you	
  need	
  to	
  clear	
  the	
  page	
  cache,	
  the	
  following	
  command	
  invokes	
  some	
  
code	
  in	
  the	
  kernel	
  that	
  drops	
  all	
  pages	
  from	
  memory.	
  
3.  The	
  next	
  step	
  is	
  to	
  start	
  the	
  database	
  
4.  And	
  aler	
  that	
  you	
  need	
  to	
  invoke	
  the	
  queries	
  that	
  should	
  cover	
  your	
  working	
  set.	
  
Queries	
  that	
  should	
  access	
  all	
  the	
  data	
  you	
  expect	
  to	
  have	
  in	
  memory.	
  
5.  At	
  this	
  point,	
  when	
  running	
  mongomem	
  you’ll	
  be	
  able	
  to	
  get	
  a	
  more	
  accurate	
  
picture	
  of	
  how	
  much	
  memory	
  is	
  required.	
  
30	
  
Before	
  looking	
  at	
  addiJonal	
  tools	
  I	
  want	
  to	
  answer	
  a	
  simple	
  quesJon:	
  how	
  do	
  we	
  
know	
  when	
  something	
  is	
  wrong?	
  what	
  do	
  we	
  need	
  to	
  monitor?	
  	
  
And	
  since	
  we’re	
  talking	
  about	
  memory,	
  how	
  do	
  we	
  know	
  we	
  don’t	
  have	
  enough	
  of	
  
it?.	
  
	
  
Well,	
  the	
  phenomenon	
  of	
  not	
  having	
  enough	
  memory	
  is	
  called	
  thrashing.	
  	
  
When	
  the	
  OS	
  is	
  thrashing,	
  it’s	
  because	
  an	
  applicaJon	
  is	
  constantly	
  accessing	
  pages	
  
that	
  are	
  not	
  in	
  memory,	
  the	
  OS	
  is	
  busy	
  handling	
  the	
  pagefaults,	
  reading	
  the	
  pages	
  
from	
  disk.	
  
	
  
So	
  the	
  first	
  thing	
  to	
  monitor	
  is	
  page	
  faults	
  (!),	
  and	
  since	
  it’s	
  hard	
  to	
  tell	
  how	
  many	
  
page	
  faults	
  are	
  too	
  much,	
  you	
  should	
  also	
  look	
  at	
  disk	
  uJlizaJon,	
  if	
  the	
  disk	
  is	
  uJlized	
  
100%	
  of	
  the	
  Jme,	
  you’re	
  in	
  trouble.	
  
There	
  are	
  a	
  lot	
  of	
  other	
  things	
  that	
  go	
  wrong	
  like	
  (!)	
  a	
  lot	
  of	
  queries	
  being	
  queued	
  and	
  
high	
  locking	
  raJos	
  but	
  these	
  just	
  are	
  symptoms	
  
	
  
	
  
31	
  
I	
  usually	
  use	
  iostat	
  for	
  looking	
  at	
  disk	
  utlizaJon.	
  
	
  
Here’s	
  an	
  example	
  output	
  of	
  the	
  command,	
  the	
  rightmost	
  column	
  shows	
  this	
  disk	
  
uJlizaJon	
  and	
  reveals	
  a	
  disk	
  that	
  is	
  busy	
  a	
  100%	
  of	
  the	
  Jme.	
  
The	
  second	
  column	
  show	
  the	
  disk	
  serves	
  570	
  reads	
  per	
  second	
  and	
  the	
  third	
  column	
  
shows	
  the	
  number	
  of	
  writes	
  per	
  second	
  which	
  is	
  zero.	
  
If	
  this	
  is	
  happening	
  constantly,	
  the	
  working	
  set	
  does	
  not	
  fit	
  in	
  memory.	
  
	
  
Along	
  with	
  iostat,	
  I	
  frequently	
  use	
  mongostat	
  
32	
  
Mongostat	
  comes	
  packaged	
  with	
  MongoDB	
  and	
  uses	
  the	
  underlying	
  (!)	
  serverStatus	
  
command.	
  It	
  displays	
  a	
  bunch	
  of	
  interesJng	
  metrics	
  like	
  (!)	
  the	
  number	
  of	
  page	
  faults	
  
and	
  queued	
  reads.	
  
It’s	
  preOy	
  hard	
  to	
  say	
  how	
  many	
  page	
  faults	
  are	
  too	
  much	
  but	
  more	
  than	
  one	
  or	
  two	
  
hundred	
  page	
  faults	
  per	
  second	
  are	
  an	
  indicaJon	
  of	
  a	
  lot	
  of	
  data	
  being	
  read	
  from	
  
disk.	
  If	
  this	
  happens	
  over	
  long	
  periods	
  of	
  Jme	
  it	
  could	
  be	
  an	
  indicaJon	
  the	
  working	
  
set	
  does	
  not	
  fit	
  in	
  RAM.	
  	
  
If	
  the	
  number	
  of	
  queued	
  reads	
  is	
  larger	
  than	
  a	
  hundred	
  over	
  long	
  periods	
  of	
  Jme	
  it	
  
could	
  also	
  be	
  an	
  indicaJon	
  the	
  working	
  set	
  doesn’t	
  fit	
  in	
  RAM.	
  
It’s	
  olen	
  important	
  to	
  look	
  at	
  these	
  parameters	
  over	
  Jme	
  in	
  order	
  to	
  determine	
  if	
  
there’s	
  a	
  sudden	
  spike	
  or	
  repeaJng	
  problem.	
  This	
  brings	
  me	
  to	
  offline	
  monitoring.	
  
33	
  
Tools	
  like	
  the	
  (!)	
  MMS	
  or	
  graphite	
  can	
  show	
  you	
  these	
  important	
  metrics	
  over	
  Jme.	
  
	
  
Using	
  one	
  of	
  these	
  tools	
  is	
  (!)	
  mandatory	
  for	
  a	
  producJon	
  system.	
  I	
  cannot	
  tell	
  you	
  
how	
  useful	
  they	
  are.	
  
Whenever	
  we	
  get	
  a	
  Jcket	
  about	
  a	
  performance	
  problem	
  we	
  put	
  our	
  Sherlock	
  hats	
  on	
  
and	
  start	
  an	
  invesJgaJon.	
  	
  
We	
  look	
  at	
  metrics	
  related	
  to	
  our	
  applicaJon	
  but	
  also,	
  a	
  lot	
  of	
  metrics	
  related	
  to	
  
mongo	
  and	
  how	
  they	
  change	
  over	
  Jme:	
  we	
  look	
  at	
  the	
  number	
  of	
  queries,	
  the	
  
number	
  of	
  documents	
  in	
  collecJons	
  and	
  tens	
  of	
  other	
  metrics.	
  
	
  
I’d	
  like	
  to	
  show	
  you	
  an	
  example	
  workflow	
  of	
  a	
  Jcket.	
  	
  
	
  
Try	
  to	
  picture	
  this:	
  it	
  was	
  a	
  quiet	
  evening,	
  I	
  was	
  about	
  to	
  go	
  to	
  sleep,	
  when	
  I	
  get	
  an	
  
automated	
  email	
  that	
  one	
  of	
  our	
  shards	
  is	
  misbehaving,	
  what	
  were	
  the	
  symptoms?	
  it	
  
had	
  more	
  than	
  300	
  queries	
  just	
  waiJng	
  in	
  queue.	
  What	
  do	
  I	
  do	
  next?	
  
	
  
34	
  
I	
  immediately	
  open	
  graphite,	
  this	
  is	
  a	
  screenshot	
  of	
  the	
  number	
  of	
  page	
  faults	
  in	
  
green	
  and	
  the	
  number	
  of	
  queued	
  readers	
  in	
  blue.	
  By	
  looking	
  at	
  the	
  history	
  you	
  can	
  
spot	
  two	
  trends:	
  
1.	
  First,	
  there’s	
  a	
  spike	
  of	
  high	
  load	
  every	
  hour.	
  This	
  is	
  actually	
  normal	
  since	
  we’re	
  
doing	
  hourly	
  aggregaJons	
  of	
  our	
  data.	
  
2.	
  The	
  second	
  trend,	
  is	
  a	
  massive	
  rise	
  in	
  page	
  faults	
  and	
  queued	
  queries	
  at	
  exactly	
  
20:00.	
  At	
  this	
  point	
  there’s	
  an	
  impact	
  on	
  users	
  as	
  a	
  lot	
  of	
  queries	
  take	
  a	
  very	
  long	
  
Jme.	
  	
  
Why	
  is	
  this	
  happening?	
  Has	
  the	
  working	
  set	
  outgrown	
  memory?	
  
35	
  
Lets	
  look	
  at	
  another	
  screenshot	
  of	
  the	
  same	
  Jme	
  frame.	
  This	
  Jme	
  we	
  look	
  at	
  other	
  
metrics:	
  in	
  blue	
  are	
  the	
  numbers	
  of	
  queries,	
  in	
  green	
  are	
  the	
  number	
  of	
  updates	
  and	
  
in	
  red	
  is	
  the	
  disk	
  uJlizaJon.	
  	
  
Remember	
  that	
  disk	
  uJlizaJon	
  is	
  measured	
  in	
  percentage	
  so	
  even	
  though	
  the	
  graph	
  
is	
  lower	
  than	
  others	
  we	
  can	
  sJll	
  see	
  that	
  at	
  20:00	
  the	
  disk	
  was	
  constantly	
  uJlized	
  at	
  a	
  
100%.	
  
When	
  looking	
  at	
  the	
  updates	
  vs.	
  queries	
  it’s	
  obvious	
  that	
  a	
  huge	
  amount	
  of	
  updates	
  is	
  
hurJng	
  the	
  query	
  performance.	
  We	
  were	
  busy	
  wriJng	
  to	
  disk.	
  
	
  
In	
  this	
  case	
  an	
  applicaJon	
  change	
  was	
  the	
  root	
  cause	
  of	
  the	
  problem,	
  the	
  applicaJon	
  
simply	
  started	
  updaJng	
  a	
  lot	
  more	
  documents.	
  
So	
  using	
  graphite,	
  we	
  were	
  able	
  to	
  trace	
  the	
  problem	
  to	
  a	
  specific	
  change	
  in	
  our	
  
applicaJon	
  and	
  later	
  on	
  modified	
  our	
  schema	
  to	
  reduce	
  the	
  document	
  size	
  and	
  the	
  
load	
  on	
  disk.	
  
	
  
This	
  brings	
  me	
  to	
  next	
  topic	
  which	
  is	
  opJmizaJon.	
  
36	
  
When	
  opJmizing	
  memory	
  usage	
  the	
  main	
  target	
  is	
  to	
  reduce	
  the	
  amount	
  of	
  required	
  
memory	
  for	
  your	
  applicaJon.	
  	
  
(!)	
  Smaller	
  the	
  collecJons	
  and	
  documents	
  are,	
  the	
  faster	
  the	
  queries	
  will	
  be.	
  not	
  just	
  
in	
  terms	
  of	
  memory	
  but	
  also	
  disk,	
  if	
  documents	
  are	
  smaller	
  less	
  disk	
  access	
  is	
  
required	
  to	
  read	
  them.	
  
	
  
There	
  are	
  several	
  opJmizaJons	
  you	
  can	
  do	
  when	
  it	
  comes	
  to	
  schema:	
  (!)	
  
1.  first,	
  shorten	
  the	
  keys.	
  we’ve	
  started	
  with	
  long	
  names	
  like	
  firstName,	
  then,	
  
shortened	
  them	
  to	
  a	
  single	
  word	
  or	
  acronym	
  and	
  finally	
  used	
  one	
  or	
  two	
  leOers	
  
since	
  it	
  had	
  a	
  huge	
  impact	
  on	
  the	
  size	
  of	
  our	
  data.	
  By	
  shortening	
  the	
  keys	
  we	
  
reduced	
  the	
  size	
  of	
  our	
  data	
  in	
  more	
  than	
  50%.	
  There	
  is	
  a	
  huge	
  downside	
  for	
  
doing	
  this	
  because	
  it	
  obscures	
  the	
  data	
  but	
  fortunately,	
  we	
  have	
  an	
  API	
  that	
  hides	
  
this	
  ugly	
  implementaJon	
  detail	
  so	
  it	
  doesn’t	
  have	
  an	
  impact	
  on	
  our	
  users.	
  
2.  Another	
  thing	
  to	
  consider	
  is	
  the	
  tradeoff	
  between	
  the	
  number	
  of	
  documents	
  and	
  
their	
  size,	
  in	
  many	
  use	
  cases	
  it’s	
  more	
  efficient	
  to	
  store	
  a	
  smaller	
  amount	
  of	
  large	
  
documents	
  vs.	
  a	
  large	
  amount	
  of	
  small	
  ones.	
  
We	
  previously	
  seen	
  how	
  padding	
  occupies	
  memory,	
  by	
  changing	
  the	
  padding	
  factor	
  
and	
  running	
  repair	
  every	
  some	
  Jme	
  you	
  can	
  reduce	
  the	
  padding	
  overhead.	
  
	
  
The	
  next	
  thing	
  you	
  can	
  opJmize	
  is	
  indices	
  
37	
  
First	
  thing	
  you	
  should	
  know	
  is	
  that	
  unused	
  indices	
  are	
  sJll	
  accessed	
  whenever	
  
documents	
  are	
  being	
  inserted,	
  updated	
  or	
  deleted.	
  Try	
  to	
  idenJfy	
  those	
  and	
  remove	
  
them.	
  
(!)	
  Use	
  sparse	
  indices	
  when	
  only	
  some	
  of	
  the	
  documents	
  will	
  have	
  the	
  indexed	
  
aOribute	
  as	
  they	
  use	
  less	
  space.	
  
(!)	
  The	
  last	
  thing	
  I	
  want	
  to	
  talk	
  about	
  is	
  how	
  much	
  of	
  the	
  index	
  is	
  located	
  in	
  memory.	
  
The	
  answer	
  is:	
  it	
  depends.	
  
	
  
If	
  the	
  enJre	
  index	
  is	
  accessed	
  by	
  queries	
  then	
  the	
  enJre	
  index	
  should	
  be	
  located	
  in	
  
memory.	
  If	
  only	
  a	
  single	
  part	
  of	
  the	
  index	
  is	
  used,	
  only	
  that	
  part	
  has	
  to	
  fit	
  in	
  memory.	
  	
  
Lets	
  look	
  at	
  a	
  few	
  examples	
  to	
  emphasize	
  the	
  difference,	
  you	
  can	
  imagine	
  an	
  index	
  (!)	
  
as	
  a	
  segment	
  of	
  memory,	
  	
  the	
  red	
  marks	
  are	
  locaJons	
  frequently	
  accessed	
  by	
  
queries.	
  	
  
	
  
(!)	
  The	
  first	
  example	
  is	
  an	
  index	
  on	
  a	
  date	
  field	
  called	
  creaJon_Jme.	
  Each	
  inserted	
  
document	
  inserts	
  the	
  largest	
  value	
  of	
  all	
  previous	
  ones	
  so	
  the	
  right	
  most	
  part	
  of	
  the	
  
index	
  is	
  updated.	
  
In	
  many	
  such	
  indexes	
  only	
  the	
  recent	
  history	
  is	
  olen	
  accessed	
  so	
  only	
  the	
  right-­‐most	
  
part	
  of	
  the	
  index	
  will	
  be	
  located	
  in	
  memory.	
  
(!)	
  The	
  second	
  example	
  is	
  an	
  index	
  on	
  a	
  person’s	
  name,	
  the	
  index	
  accesses	
  will	
  
probably	
  distribute	
  evenly	
  across	
  the	
  enJre	
  index	
  so	
  most	
  of	
  it	
  will	
  be	
  located	
  in	
  
memory.	
  
38	
  
So	
  lets	
  summarize	
  what	
  we’ve	
  learned:	
  
1.	
  We’ve	
  seen	
  how	
  memory	
  management	
  works,	
  we’ve	
  started	
  from	
  the	
  disk	
  and	
  
RAM,	
  went	
  up	
  the	
  stack	
  to	
  the	
  page	
  cache	
  whose	
  sole	
  purpose	
  is	
  to	
  improve	
  read	
  and	
  
write	
  performance	
  by	
  using	
  the	
  memory.	
  We	
  conJnued	
  to	
  memory	
  mapped	
  files	
  
which	
  translate	
  memory	
  accesses	
  like	
  reads	
  and	
  writes	
  to	
  file	
  reads	
  and	
  writes.	
  And	
  
we	
  finished	
  with	
  MongoDB’s	
  usage	
  of	
  these	
  mechanisms.	
  
2.	
  We’ve	
  talked	
  about	
  the	
  challenges	
  this	
  strategy	
  presents:	
  like	
  predicJng	
  and	
  
measuring	
  the	
  size	
  of	
  the	
  working	
  set.	
  
3.	
  We	
  then	
  talked	
  about	
  monitoring,	
  which	
  is	
  something	
  you	
  have	
  to	
  do	
  if	
  you	
  have	
  a	
  
DB	
  running	
  in	
  producJon.	
  	
  
4.	
  We	
  finished	
  with	
  schema	
  and	
  index	
  opJmizaJons	
  which	
  are	
  crucial	
  for	
  cuhng	
  
costs	
  and	
  improving	
  performance.	
  
39	
  
And	
  that’s	
  it!	
  I	
  hope	
  you	
  enjoyed	
  my	
  talk	
  and	
  thanks	
  for	
  having	
  me.	
  
	
  
	
  
40	
  

Contenu connexe

Tendances

Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Advanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupAdvanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupMongoDB
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategiesTiep Vu
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentMongoDB
 
Automate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management ServiceAutomate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management ServiceMongoDB
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 
Webinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica SetWebinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica SetMongoDB
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinarMongoDB
 
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudWebinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudMongoDB
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)Ontico
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb shardingxiangrong
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentSebastian Geib
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 

Tendances (20)

Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Advanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupAdvanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and Backup
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
Automate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management ServiceAutomate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management Service
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Webinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica SetWebinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica Set
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudWebinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environment
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 

En vedette

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
MongoDB: How it Works
MongoDB: How it WorksMongoDB: How it Works
MongoDB: How it WorksMike Dirolf
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
mongoDB Performance
mongoDB PerformancemongoDB Performance
mongoDB PerformanceMoshe Kaplan
 
Mongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksMongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksVladimir Malyk
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityMongoDB
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)MongoDB
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922radiocats
 
Retour aux fondamentaux : Penser en termes de documents
Retour aux fondamentaux : Penser en termes de documentsRetour aux fondamentaux : Penser en termes de documents
Retour aux fondamentaux : Penser en termes de documentsMongoDB
 
Automated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotAutomated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotMongoDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hansonhungarianhc
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesBrian McNamara
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions Ahmad Tahhan
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger OverviewWiredTiger
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 

En vedette (20)

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MongoDB: How it Works
MongoDB: How it WorksMongoDB: How it Works
MongoDB: How it Works
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and Monitoring
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
mongoDB Performance
mongoDB PerformancemongoDB Performance
mongoDB Performance
 
Mongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksMongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricks
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922
 
Retour aux fondamentaux : Penser en termes de documents
Retour aux fondamentaux : Penser en termes de documentsRetour aux fondamentaux : Penser en termes de documents
Retour aux fondamentaux : Penser en termes de documents
 
Automated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotAutomated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index Robot
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetes
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions
 
Tuning Linux for MongoDB
Tuning Linux for MongoDBTuning Linux for MongoDB
Tuning Linux for MongoDB
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 

Similaire à MongoDB memory management demystified

MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Shardinguzzal basak
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsAlireza Kamrani
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWScolinthehowe
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack OptimizationDave Ross
 
Demartek lenovo s3200_sql_server_evaluation_2016-01
Demartek lenovo s3200_sql_server_evaluation_2016-01Demartek lenovo s3200_sql_server_evaluation_2016-01
Demartek lenovo s3200_sql_server_evaluation_2016-01Lenovo Data Center
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cacheVISHAL DONGA
 
Mongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSMongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSronwarshawsky
 
MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingBoxed Ice
 
7-zip compression settings guide
7-zip compression settings guide7-zip compression settings guide
7-zip compression settings guideLevan Chelidze
 

Similaire à MongoDB memory management demystified (20)

MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAs
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Nachos 2
Nachos 2Nachos 2
Nachos 2
 
Nachos 2
Nachos 2Nachos 2
Nachos 2
 
SSD-Bondi.pptx
SSD-Bondi.pptxSSD-Bondi.pptx
SSD-Bondi.pptx
 
Cache memory presentation
Cache memory presentationCache memory presentation
Cache memory presentation
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWS
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
Mysql talk
Mysql talkMysql talk
Mysql talk
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Virtual memory (testing)
Virtual memory (testing)Virtual memory (testing)
Virtual memory (testing)
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
 
Mongodb
MongodbMongodb
Mongodb
 
Demartek lenovo s3200_sql_server_evaluation_2016-01
Demartek lenovo s3200_sql_server_evaluation_2016-01Demartek lenovo s3200_sql_server_evaluation_2016-01
Demartek lenovo s3200_sql_server_evaluation_2016-01
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
Mongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSMongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMS
 
MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
Amazon silk browser
Amazon silk browserAmazon silk browser
Amazon silk browser
 
7-zip compression settings guide
7-zip compression settings guide7-zip compression settings guide
7-zip compression settings guide
 

Dernier

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligencePriyadharshiniG41
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Dernier (20)

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

MongoDB memory management demystified

  • 2. Hello  everyone,  my  name  is  Alon  Horev,  I’m  based  in  Israel  and  I’m  working  at   intucell  which  was  acquired  by  Cisco.     I’m  a  python  developer  and  I  lead  intucell’s  data  team.  About  two  years  ago  we   migrated  our  product  off  MySQL  and  started  working  with  MongoDB.     I  want  to  start  off  by  introducing  our  use  case  of  MongoDB:   We’ve  built  a  system  that  opJmizes  cellular  networks  automaJcally.  OpJmizing   cellular  networks  is  about  making  your  data  connecJon  faster  and  improve  the   quality  of  your  calls.   2  
  • 3. The  way  we  do  this  is  preOy  simple,  we  collect  a  lot  of  staJsJcs  about  what  goes  on   in  the  network,  like  how  many  calls  are  taking  place  or  how  many  users  are   connected  to  the  antenna.     We  then  analyze  this  informaJon  to  idenJfy  things  like  which  antennas  are  loaded.   Once  we  know  what  are  the  problems  in  the  network  we  act,  we  change  parameters   in  the  network  ,  for  example,  we  would  force  your  phone  to  use  a  different  antenna   so  you’ll  get  a  beOer  service.   Now,  as  you  see  this  process  is  cyclic,  we’ll  collect  more  staJsJcs  to  make  further   changes  and  make  sure  we  improved  the  network.  This  happens  all  the  Jme,  even   here  right  now,  with  AT&T.     In  the  process  of  working  with  MongoDB  we  learned  a  lot  about  database   performance  and  server  performance.  I  personally  spent  a  lot  of  Jme  monitoring  and   opJmizing  the  storage  and  memory  usage  which  brings  me  to  this  lecture.     3  
  • 4. Today  I’m  going  to  try  and  give  you  an  understanding  of  how  MongoDB  manages   memory.       So,  first,  what  is  'memory  management'  when  it  comes  to  MongoDB?     Well,  memory  is  a  fast  but  limited  and  expensive  resource,  memory  management  is   about  deciding  what  data  to  save  in  memory.     4  
  • 5. Why  should  you  care  about  memory  management?     memory  management  has  a  huge  impact  on  performance  and  costs.   This  relates  both  to  developers  and  dbas,  as  a  developer  you  can  opJmize  the   schema  and  queries  for  beOer  memory  usage,   As  a  dba  you  can  monitor  and  predict  performance  issues  related  to  memory  usage.   I’m  preOy  sure  every  mongodb  administrator  asked  himself  atleast  once:  how  much   memory  do  I  really  need?.     Before  we  dive  in  I  want  to  tell  you  a  liOle  secret:  MongoDB  doesn’t  actually  manage   memory.  It  leaves  that  responsibility  to  the  operaJng  system.     5  
  • 6. Within  the  operaJng  system  there’s  a  stack  of  components  which  MongoDB  depends   on  to  manage  memory.   Each  component  relies  on  the  component  below  it.     (!)   This  talk  is  structured  around  this  stack  of  components.   We’ll  start  from  the  low  level  components  which  are  storage  devices:  disks  and  RAM   We’ll  conJnue  with  the  page  cache  and  memory  mapped  files  which  are  a  part  of  the   operaJng  system’s  kernel   And  we’ll  finish  off  with  MongoDB’s  usage  of  these  mechanisms.       (!)     Let’s  talk  about  storage.         6  
  • 7. There  are  different  types  of  storage  devices  with  different  characterisJcs,  we’ll   review  hard  disk  drives,  solid  state  drives  and  RAM.     Let’s  start  by  breaking  these  into  categories:  (!)  HDDs  and  SSDs  are  persistent  and   RAM  isn’t,  but  RAM  is  really  fast.     That’s  why  every  computer  has  both  types  of  storage,  one  persistent  (a  HDD  or  a   SSD)  and  one  is  volaJle  (RAM).     7  
  • 8. Now  let’s  compare  throughput.  As  I  said  before,  RAM  is  fast,  it  could  go  as  fast  as   6400  MBPS  for  reads  and  writes.     SSDs  are  10  Jmes  slower  than  RAM,  modern  SSDs  can  reach  a  read  rate  of  650  MBPS   and  a  liOle  less  for  writes.   HDDs  are  much  slower,  ranging  from  1  MB  to  160  MB  per  second  for  reads  and   writes.       The  reason  there’s  such  variance  in  HDD  speed  is  because  throughput  is  highly   affected  by  access  paOerns.   Specifically  with  HDDs,  random  access  is  much  slower  than  sequenJal  access,  and   that’s  because  a  HDD  contains  a  mechanical  arm  that  needs  to  move  on  almost  every   random  access.   Sadly  for  us,  databases  do  a  lot  of  random  I/O.  which  means,  if  you’re  running  a   query  on  data  that’s  not  in  memory  and  therefore,  it  has  to  be  read  from  disk,  you’re   seeing  a  penalty  of  about  two  mulJtudes  on  response  Jmes.       The  next  characterisJc  is  price.  (!)   For  making  the  comparison  easier  we’ll  compare  the  price  per  GB.  It’s  not  surprising   that  there’s  a  correlaJon  between  price  and  throughput,  meaning,  the  more  you  pay   for  each  GB,  you  get  beOer  throughput.  So  hard  drives  are  really  cheap  at  5  cents  per   GB,  SSDs  are  10  Jmes  more  expensive  and  RAM  is  100  Jmes  more  expensive.   8  
  • 9. Is  this  informaJon  sufficient  to  choose  the  opJmal  hardware  configuraJon?  I  think   it’s  not,  your  applicaJon’s  requirements  are  also  a  part  of  the  equaJon.   For  example,  if  your  applicaJon  is  an  archive  that  saves  huge  amounts  of  data  that  is   rarely  accessed,  you  can  go  for  a  large  HDD  and  save  a  lot  of  money.   Later  on  we’ll  see  how  can  you  take  measurements  of  things  like  RAM  and  capacity   and  then  you’ll  be  able  to  determine  what  kind  of  hardware  configuraJon  you  need.   9  
  • 10. Now  lets  zoom  out  of  storage  and  and  move  up  to  the  next  layer  which  is  the  page   cache.       10  
  • 11. The  page  cache  is  a  part  of  the  operaJng  system’s  kernel  and  whenever  a  program   does  file  I/O  like  reads  and  writes  it  always  goes  through  the  page  cache.   The  page  cache  makes  reads  faster  by  saving  popular  chunks  of  data  in  memory  and   makes  writes  faster  by  lehng  the  applicaJon  write  to  memory  and  not  to  disk.   So  we  can  say  the  page  cache  was  invented  to  combine  the  disk’s  persistence  with   the  memory’s  speed.  It’s  about  having  the  best  of  both  worlds.     11  
  • 12. So..  It’s  called  the  page  cache  but  what  is  a  page?     A  page  is  a  4K  chunk  of  data.  Each  file  is  broken  into  pages.  The  number  of  pages   belong  to  a  file  is  simply  the  file’s  size  divided  by  4K.     (!)   Looking  at  the  example,  you  can  see  a  file  spanning  3  pages  because  it’s  10  kilobytes   in  size,  that  grey  area  is  an  unused  part  of  the  last  page  as  the  file’s  size  isn’t  a   mulJple  of  4  kilobytes.     The  page  cache’s  job  is  to  determine  which  pages  to  save  in  memory.     12  
  • 13. Lets  dive  a  liOle  deeper  and  see  what  happens  behind  the  scenes  when  we  read  from   a  file.   (!)   We  have  a  process  running  in  user  space  and  it’s  reading  100  bytes  from  a  file.   (!)   Through  a  system  call  we  get  to  the  kernel  where  the  page  cache  handles  the  read   request.     (!)   First,  the  page  cache  translates  the  posiJon  and  count  of  bytes  to  read  to  a  list  of   pages.  If  we  would  read  a  100  bytes  from  the  beginning  of  the  file,  the  result  of  this   step  would  be  the  first  page.   (!)   The  next  thing  the  page  cache  will  do  is  check  if  the  page  exists  in  the  cache,  (!)  if  it’s   not,  the  data  has  to  be  read  from  disk  and  then  it  will  be  stored  in  the  cache.   Once  the  page  is  in  the  cache  we  reach  the  last  step,  (!)  which  is  to  copy  the  data  to   the  user  space  applicaJon.     So  that’s  how  a  read  works.     13  
  • 14. The  page  cache  also  handles  writes.     (!)   This  Jme  our  process  is  calling  the  write  system  call.   (!)   The  page  cache  copies  the  data  from  the  process  to  the  relevant  pages  and  marks   them  as  dirty.  That’s  all  it  does,  change  data  in  memory. It  gives  the  impression  the  data  has  been  wriOen,  where  in  fact  it  has  been  wriOen   only  to  memory  and  not  to  disk.  If  an  applicaJon  would  read  from  the  file  it  would   get  the  latest  the  data  from  memory  because  dirty  pages  must  stay  in  the  cache.       Having  dirty  pages  is  somewhat  dangerous  for  two  reasons:  first,  they  will  be  lost  if   the  operaJng  system  crashes.  Second,  if  there’s  a  lack  of  memory  they  can’t  be  freed.   The  soluJon  for  these  problems  is  to  flush  the  dirty  pages  to  the  disk.  (!)  There’s  a   thread  in  the  kernel  that  flushes  pages  aler  they  stay  in  the  cache  for  some  Jme  or   when  memory  needs  to  be  freed.     If  a  process  wants  to  make  sure  the  data  is  flushed  to  disk  it  can  call  the  fsync  system   call  that  can  trigger  a  flush  for  a  specific  file  or  even  the  enJre  file  system.     MongoDB  calls  that  every  30  seconds  to  make  sure  data  is  backed  by  disk.             14  
  • 15. I  menJoned  how  the  page  cache  frees  pages  when  memory  is  running  low,  this   procedure  is  called  page  reclamaJon.     There  are  different  page  reclamaJon  policies.  A  page  reclamaJon  policy  is  an   algorithm  that  answers  a  simple  quesJon:  “what’s  the  next  page  that  can  be  freed?”   In  linux,  the  simple  answer  is:  “The  one  that  is  the  least  recently  used”.     Turns  out  page  reclamaJon  is  happening  all  the  Jme  even  on  healthy  systems,  it   doesn’t  mean  you’re  out  of  memory.     That’s  because  the  page  cache  is  greedy  and  will  try  to  use  all  the  free  memory  on   your  machine  to  cache  the  file  system.     In  order  to  understand  how  much  memory  is  used  by  the  page  cache  you  can  use  the   free  command.   15  
  • 16. Free  is  a  linux  program  that  displays  memory  usage  staJsJcs.  Lets  try  to  interpret  its   output.   When  running  free  with  –g  it  prints  units  in  GBs.  The  first  line  reveals  the  total   amount  of  memory  which  is  64GB,  out  of  these  61GB  are  used  and  3GB  are  free.   Then,  out  of  the  61GB  that  are  used,  55GB  are  of  of  cached  data.  These  are  pages  in   the  page  cache.     The  second  line  interprets  the  cached  data  as  free  so  we  suddenly  have  only  5GB  of   used  memory.  This  is  memory  directly  allocated  by  programs.   The  reason  cached  memory  can  be  considered  free  is  because  even  though  the   memory  is  used  it  will  be  freed  if  programs  need  it.     As  soon  as  programs  allocate  memory  and  the  free  memory  runs  out  the  page  cache   shrinks  and  frees  pages.     16  
  • 17. The  next  component  up  the  stack  is  memory  mapped  files.     17  
  • 18. Memory  mapping  of  files  is  an  alternaJve  mechanism  for  reading  and  wriJng  from   files.  Instead  of  calling  the  read()  and  write()  system  calls,  a  process  can  map  a  part  of   file  into  memory  and  every  access  the  process  makes  to  memory  translates  to  a  file   read  or  write.     On  the  lel  you  can  see  a  process  with  a  memory  region  which  is  mapped  to  a   segment  of  a  file.     So  memory  addresses  100  to  200  are  mapped  to  a  file  segment  that  starts  at  400  and   ends  at  500.   A  write  to  memory  address  100  is  translated  to  a  write  to  the  file  at  address  400.     Mapping  a  file  into  memory  doesn’t  necessarily  load  its  data  into  memory,  if  a   process  reads  from  a  page  that  is  not  in  memory  the  infamous  page  fault  is  triggered.   The  code  in  the  kernel  that  handles  page  faults  tells  the  page  cache  to  load  the   required  pieces  of  data  from  disk  and  then  serves  the  read.     So  memory  mapping  has  several  advantages  over  regular  file  I/O:   First,  it’s  fast,  there’s  no  system  call  involved  and  no  copying  of  memory.  Reads  and   writes  access  memory  that  is  allocated  in  the  page  cache.   Second,  it  takes  the  responsibility  of  memory  management  from  the  user.  As  we’ve   seen  earlier,  the  page  cache  will  determine  what’s  actually  stored  in  memory.   18  
  • 19. In  this  example  two  processes  map  the  same  region  of  a  file  into  memory.  Only  one   copy  of  this  data  will  occupy  memory  or  even  less  if  it’s  not  accessed.   Historically  this  mechanism  was  invented  to  reduce  the  memory  usage  of  processes.   Whenever  you  execute  a  program,  the  program’s  code  and  it’s  shared  libraries  are   mapped  to  memory.     So  if  you  open  10  instances  of  chrome,  its  code  sJll  appears  once  in  memory.     19  
  • 20. Now  lets  see  how  Mongo  uses  this  stack  of  components   20  
  • 21. (!)   Mongo  maps  all  it’s  data  into  memory.  This  includes  the  documents,  the  indexes  and   the  journal.   (!)   When  running  top  you  can  actually  see  how  much  memory  is  mapped  and  how  much   is  used.   (!)   The  lel  column  called  VIRT  stands  for  virtual  memory,  once  a  process  maps  files  to   memory  they’re  accounted  under  virtual  memory.   When  using  journaling  mongo  actually  maps  the  data  files  twice,  so  this  figure  is   twice  the  amount  on  disk  which  is  about  273GB.   RES  stands  for  resident  memory  and  is  the  amount  of  memory  that’s  actually  located   in  RAM  out  the  virtual  memory.   SHR  stands  for  shared  resident  memory.  So  out  of  the  24GB  of  resident  memory,   23GB  is  data  from  memory  mapped  files  which  is  sharable.       21  
  • 22. Turns  out  this  very  cool  strategy  for  managing  memory  also  has  problems.  The   biggest  problem  is  MongoDB  (!)  has  no  control  of  what  is  saved  in  memory.  You  can’t   tell  mongo:  promise  me  this  document  or  collecJon  is  stored  in  memory  and  by  that   ensuring  fast  access.     Why  is  this  a  problem?  I’ll  give  you  some  examples:   1.  (!)  The  first  example  is  warm-­‐up  –  aler  restarJng  your  server,  none  of  the  data  is   stored  in  memory,  for  every  page  that  is  accessed  for  the  first  Jme,  a  page  fault   will  be  triggered  and  the  query  will  take  longer.   2.  (!)  The  second  example  is  what  I  call  expensive  queries  –  expensive  queries  are   queries  that  aren’t  indexed  well  or  request  data  that  is  hardly  ever  accessed.   When  these  things  happen  documents  are  loaded  into  memory  at  the  cost  of   freeing  other  documents  who  are  more  important.  Why  does  this  happen?  As   we’ve  seen  before  the  page  cache  frees  the  least  recently  used  pages  first.       There  are  things  you  can  do  to  miJgate  this  problem.   22  
  • 23. What  we  did  is  (!)  protect  MongoDB  with  an  API.  The  API  enforces  index  usage  so   mongo  reads  less  documents  into  memory.  Another  thing  the  API  does  is  pass  a   query  Jmeout  to  make  sure  costly  queries  are  being  cancelled.     The  API  doesn’t  have  to  be  complicated,  it  could  be  a  simple  module  sihng  on  top  of   the  MongoDB  driver.     Lets  look  at  an  example,  (!)  this  is  (!)  a  python  funcJon  called  find_samples  and  it’s   used  whenever  we  want  to  run  a  find  query  on  the  collecJon  named  samples.   The  funcJon  accepts  two  parameters  that  define  a  date  range:  start_Jme  and   end_Jme.  By  forcing  the  user  to  pass  a  date  range  we  make  sure  the  query  is   indexed.  You  could  add  further  validaJons  to  make  sure  the  range  isn’t  too  big  or   doesn’t  go  too  far  back  in  history.   23  
  • 24. Another  challenge  worth  menJoning  is  (!)  the  lack  of  prioriJzaJon  between   processes.  When  processes  allocate  a  lot  of  memory  the  page  cache  shrinks   automaJcally,  and  since  mongo  relies  on  the  page  cache,  you  could  say  mongo’s   memory  shrinks  automaJcally.  In  other  words,  mongo  has  a  lower  priority  than  other   processes  over  memory.  Since  mongo  will  just  become  slower  if  it  doesn’t  have   enough  memory  you  need  to  be  careful  with  other  processes  running  on  the  same   server.     You  can  miJgate  this  phenomenon  by  isolaJng  mongo.  (!)  Don’t  run  it  on  the  same   server  along  with  memory  or  disk  intensive  applicaJons.     The  last  challenge  I’d  like  to  tackle  is  (!)  esJmaJng  how  much  memory  is  required,   also  known  as  the  size  of  the  working  set.         24  
  • 25. So  what  is  the  working  set?  this  is  the  data  that  your  applicaJon  reads  regularly  and   should  be  returned  in  a  Jmely  manner,  therefore  it  should  fit  in  memory.     The  working  set  contains  (!)  more  than  documents,  it  also  includes  indexes  and  some   padding.   To  emphasize  the  padding  issue  lets  look  at  an  example  memory  page.   (!)   As  I  menJoned  before,  a  page’s  size  is  4k.     This  page  includes  3  documents,  between  the  documents  there’s  some  padding.  This   padding  accounts  for  expansion  of  exisJng  documents  or  inserJon  of  new  ones.   Out  of  the  three  documents,  only  document  number  2  is  accessed  regularly.     So  even  though  a  small  part  of  this  page  is  actually  used,  the  whole  page  is  saved  in   memory.  the  page  cache  can’t  save  half  pages  in  memory.     This  brings  us  to  the  conclusion  that  it’s  really  hard  to  measure  the  size  of  the   working  set  by  simply  looking  at  the  count  or  size  of  the  documents  being  queried.     SJll,  there  are  several  tools  to  help  you  esJmate  how  much  memory  a  collecJon   should  require.     25  
  • 26. The  tools  fall  into  two  categories:  planning  and  monitoring.     26  
  • 27. Planning  is  about  predicJng  how  much  memory  each  collecJon  is  going  to  need.     Lets  take  a  real  world  example.  In  one  of  our  collecJons  we  save  a  month  long  of   history,  out  of  that  month  we  know  our  applicaJon  olen  queries  the  last  two  weeks   and  someJmes  the  week  before  that.  The  last  two  weeks  are  considered  “hot  data”   because  they  have  to  be  stored  in  memory,  the  week  before  that  is  considered  warm,   it  doesn’t  have  to  be  in  memory  but  we  should  sJll  take  into  account  so  it  won’t  push   out  the  hot  data.     If  we’re  going  to  take  some  spares  to  compensate  for  padding  and  such,  it’s  safe  to   assume  3  out  of  the  4  weeks  should  fit  in  memory.     (!)   You  can  use  the  collecJon  stats  command  to  get  important  metrics  like  the  size  of   indexes  and  the  size  of  the  data  and  roughly  calculate  how  much  memory  the   collecJon  is  going  to  require.     Once  you  have  a  running  database  you  can  use  several  monitoring  tools  to  analyze   the  working  set.   27  
  • 28. When  I  think  about  monitoring  tools  they  generally  fall  into  two  categories: 1.  (!)  One  is  online  monitoring  which  is  basically  seeing  what’s  going  on  at  the   moment.  This  category  includes  running  linux  commands  like  top  and  iostat  or   mongo  commands  like  currentOp,  mongostat  and  mongomem.   2.  (!)  The  second  category  is  offline  monitoring  which  is  more  about  collecJng  and   aggregaJng  historical  data.  One  example  would  be  the  profiling  collecJons  that   collects  slow  queries  over  Jme.  another  example  is  the  MMS  or  other  graphing   tools  like  graphite  that  collect  different  metrics  over  Jme.  these  are  used  for   idenJfying  trends,  correlaJons  and  predicJng  growth.   Lets  start  from  the  online  tools.   28  
  • 29. Mongomem  is  a  great  tool  for  memory  use  analysis.  It’s  wriOen  in  python  by  the   people  at  a  company  called  wish  so  you’ll  have  to  install  it  manually,  it  doesn’t  come   packaged  with  mongodb.   Mongomem  won’t  tell  you  how  much  memory  you  need  but  it  will  tell  you  how  much   memory  each  collecJon  is  using  at  the  moment.   Here’s  an  example  output,  (!)  each  line  shows  how  many  megabytes  of  the  collecJon   are  in  memory.  The  top  collecJon  in  this  example  is  the  oplog  with  more  then  11GB   of  data  in  memory  out  of  almost  50GB  of  data.  So  about  22%  of  the  collecJon  is  in   memory.   The  last  line  shows  the  total  amount  of  memory  used  by  mongo  out  of  the  total  data   size,  so  in  this  example  we  have  16GB  of  data  in  memory  out  of  280GB  of  total  data.     Since  I’ve  got  16GB  of  memory  on  this  machine,  we  can  see  all  the  memory  is  being   used.     But  what  does  this  say  about  the  working  set?  Is  it  larger  than  memory?  In  other   words,  do  we  have  enough  memory?       Well,  we  can’t  say,  because  it’s  possible  there’s  data  in  memory  that  is  hardly  ever   accessed..  The  page  cache  just  didn’t  have  to  reclaim  these  pages.   29  
  • 30. What  you  can  do  in  order  to  test  how  much  RAM  mongo  actually  uses  is  the  following   procedure:   1.  First  thing  you  have  to  do  is  stop  the  database   2.  Then,  you  need  to  clear  the  page  cache,  the  following  command  invokes  some   code  in  the  kernel  that  drops  all  pages  from  memory.   3.  The  next  step  is  to  start  the  database   4.  And  aler  that  you  need  to  invoke  the  queries  that  should  cover  your  working  set.   Queries  that  should  access  all  the  data  you  expect  to  have  in  memory.   5.  At  this  point,  when  running  mongomem  you’ll  be  able  to  get  a  more  accurate   picture  of  how  much  memory  is  required.   30  
  • 31. Before  looking  at  addiJonal  tools  I  want  to  answer  a  simple  quesJon:  how  do  we   know  when  something  is  wrong?  what  do  we  need  to  monitor?     And  since  we’re  talking  about  memory,  how  do  we  know  we  don’t  have  enough  of   it?.     Well,  the  phenomenon  of  not  having  enough  memory  is  called  thrashing.     When  the  OS  is  thrashing,  it’s  because  an  applicaJon  is  constantly  accessing  pages   that  are  not  in  memory,  the  OS  is  busy  handling  the  pagefaults,  reading  the  pages   from  disk.     So  the  first  thing  to  monitor  is  page  faults  (!),  and  since  it’s  hard  to  tell  how  many   page  faults  are  too  much,  you  should  also  look  at  disk  uJlizaJon,  if  the  disk  is  uJlized   100%  of  the  Jme,  you’re  in  trouble.   There  are  a  lot  of  other  things  that  go  wrong  like  (!)  a  lot  of  queries  being  queued  and   high  locking  raJos  but  these  just  are  symptoms       31  
  • 32. I  usually  use  iostat  for  looking  at  disk  utlizaJon.     Here’s  an  example  output  of  the  command,  the  rightmost  column  shows  this  disk   uJlizaJon  and  reveals  a  disk  that  is  busy  a  100%  of  the  Jme.   The  second  column  show  the  disk  serves  570  reads  per  second  and  the  third  column   shows  the  number  of  writes  per  second  which  is  zero.   If  this  is  happening  constantly,  the  working  set  does  not  fit  in  memory.     Along  with  iostat,  I  frequently  use  mongostat   32  
  • 33. Mongostat  comes  packaged  with  MongoDB  and  uses  the  underlying  (!)  serverStatus   command.  It  displays  a  bunch  of  interesJng  metrics  like  (!)  the  number  of  page  faults   and  queued  reads.   It’s  preOy  hard  to  say  how  many  page  faults  are  too  much  but  more  than  one  or  two   hundred  page  faults  per  second  are  an  indicaJon  of  a  lot  of  data  being  read  from   disk.  If  this  happens  over  long  periods  of  Jme  it  could  be  an  indicaJon  the  working   set  does  not  fit  in  RAM.     If  the  number  of  queued  reads  is  larger  than  a  hundred  over  long  periods  of  Jme  it   could  also  be  an  indicaJon  the  working  set  doesn’t  fit  in  RAM.   It’s  olen  important  to  look  at  these  parameters  over  Jme  in  order  to  determine  if   there’s  a  sudden  spike  or  repeaJng  problem.  This  brings  me  to  offline  monitoring.   33  
  • 34. Tools  like  the  (!)  MMS  or  graphite  can  show  you  these  important  metrics  over  Jme.     Using  one  of  these  tools  is  (!)  mandatory  for  a  producJon  system.  I  cannot  tell  you   how  useful  they  are.   Whenever  we  get  a  Jcket  about  a  performance  problem  we  put  our  Sherlock  hats  on   and  start  an  invesJgaJon.     We  look  at  metrics  related  to  our  applicaJon  but  also,  a  lot  of  metrics  related  to   mongo  and  how  they  change  over  Jme:  we  look  at  the  number  of  queries,  the   number  of  documents  in  collecJons  and  tens  of  other  metrics.     I’d  like  to  show  you  an  example  workflow  of  a  Jcket.       Try  to  picture  this:  it  was  a  quiet  evening,  I  was  about  to  go  to  sleep,  when  I  get  an   automated  email  that  one  of  our  shards  is  misbehaving,  what  were  the  symptoms?  it   had  more  than  300  queries  just  waiJng  in  queue.  What  do  I  do  next?     34  
  • 35. I  immediately  open  graphite,  this  is  a  screenshot  of  the  number  of  page  faults  in   green  and  the  number  of  queued  readers  in  blue.  By  looking  at  the  history  you  can   spot  two  trends:   1.  First,  there’s  a  spike  of  high  load  every  hour.  This  is  actually  normal  since  we’re   doing  hourly  aggregaJons  of  our  data.   2.  The  second  trend,  is  a  massive  rise  in  page  faults  and  queued  queries  at  exactly   20:00.  At  this  point  there’s  an  impact  on  users  as  a  lot  of  queries  take  a  very  long   Jme.     Why  is  this  happening?  Has  the  working  set  outgrown  memory?   35  
  • 36. Lets  look  at  another  screenshot  of  the  same  Jme  frame.  This  Jme  we  look  at  other   metrics:  in  blue  are  the  numbers  of  queries,  in  green  are  the  number  of  updates  and   in  red  is  the  disk  uJlizaJon.     Remember  that  disk  uJlizaJon  is  measured  in  percentage  so  even  though  the  graph   is  lower  than  others  we  can  sJll  see  that  at  20:00  the  disk  was  constantly  uJlized  at  a   100%.   When  looking  at  the  updates  vs.  queries  it’s  obvious  that  a  huge  amount  of  updates  is   hurJng  the  query  performance.  We  were  busy  wriJng  to  disk.     In  this  case  an  applicaJon  change  was  the  root  cause  of  the  problem,  the  applicaJon   simply  started  updaJng  a  lot  more  documents.   So  using  graphite,  we  were  able  to  trace  the  problem  to  a  specific  change  in  our   applicaJon  and  later  on  modified  our  schema  to  reduce  the  document  size  and  the   load  on  disk.     This  brings  me  to  next  topic  which  is  opJmizaJon.   36  
  • 37. When  opJmizing  memory  usage  the  main  target  is  to  reduce  the  amount  of  required   memory  for  your  applicaJon.     (!)  Smaller  the  collecJons  and  documents  are,  the  faster  the  queries  will  be.  not  just   in  terms  of  memory  but  also  disk,  if  documents  are  smaller  less  disk  access  is   required  to  read  them.     There  are  several  opJmizaJons  you  can  do  when  it  comes  to  schema:  (!)   1.  first,  shorten  the  keys.  we’ve  started  with  long  names  like  firstName,  then,   shortened  them  to  a  single  word  or  acronym  and  finally  used  one  or  two  leOers   since  it  had  a  huge  impact  on  the  size  of  our  data.  By  shortening  the  keys  we   reduced  the  size  of  our  data  in  more  than  50%.  There  is  a  huge  downside  for   doing  this  because  it  obscures  the  data  but  fortunately,  we  have  an  API  that  hides   this  ugly  implementaJon  detail  so  it  doesn’t  have  an  impact  on  our  users.   2.  Another  thing  to  consider  is  the  tradeoff  between  the  number  of  documents  and   their  size,  in  many  use  cases  it’s  more  efficient  to  store  a  smaller  amount  of  large   documents  vs.  a  large  amount  of  small  ones.   We  previously  seen  how  padding  occupies  memory,  by  changing  the  padding  factor   and  running  repair  every  some  Jme  you  can  reduce  the  padding  overhead.     The  next  thing  you  can  opJmize  is  indices   37  
  • 38. First  thing  you  should  know  is  that  unused  indices  are  sJll  accessed  whenever   documents  are  being  inserted,  updated  or  deleted.  Try  to  idenJfy  those  and  remove   them.   (!)  Use  sparse  indices  when  only  some  of  the  documents  will  have  the  indexed   aOribute  as  they  use  less  space.   (!)  The  last  thing  I  want  to  talk  about  is  how  much  of  the  index  is  located  in  memory.   The  answer  is:  it  depends.     If  the  enJre  index  is  accessed  by  queries  then  the  enJre  index  should  be  located  in   memory.  If  only  a  single  part  of  the  index  is  used,  only  that  part  has  to  fit  in  memory.     Lets  look  at  a  few  examples  to  emphasize  the  difference,  you  can  imagine  an  index  (!)   as  a  segment  of  memory,    the  red  marks  are  locaJons  frequently  accessed  by   queries.       (!)  The  first  example  is  an  index  on  a  date  field  called  creaJon_Jme.  Each  inserted   document  inserts  the  largest  value  of  all  previous  ones  so  the  right  most  part  of  the   index  is  updated.   In  many  such  indexes  only  the  recent  history  is  olen  accessed  so  only  the  right-­‐most   part  of  the  index  will  be  located  in  memory.   (!)  The  second  example  is  an  index  on  a  person’s  name,  the  index  accesses  will   probably  distribute  evenly  across  the  enJre  index  so  most  of  it  will  be  located  in   memory.   38  
  • 39. So  lets  summarize  what  we’ve  learned:   1.  We’ve  seen  how  memory  management  works,  we’ve  started  from  the  disk  and   RAM,  went  up  the  stack  to  the  page  cache  whose  sole  purpose  is  to  improve  read  and   write  performance  by  using  the  memory.  We  conJnued  to  memory  mapped  files   which  translate  memory  accesses  like  reads  and  writes  to  file  reads  and  writes.  And   we  finished  with  MongoDB’s  usage  of  these  mechanisms.   2.  We’ve  talked  about  the  challenges  this  strategy  presents:  like  predicJng  and   measuring  the  size  of  the  working  set.   3.  We  then  talked  about  monitoring,  which  is  something  you  have  to  do  if  you  have  a   DB  running  in  producJon.     4.  We  finished  with  schema  and  index  opJmizaJons  which  are  crucial  for  cuhng   costs  and  improving  performance.   39  
  • 40. And  that’s  it!  I  hope  you  enjoyed  my  talk  and  thanks  for  having  me.       40