MongoDB memory management demystified

Hello
everyone,
my
name
is
Alon
Horev,
I’m
based
in
Israel
and
I’m
working
at

intucell
which
was
acquired
by
Cisco.

I’m
a
python
developer
and
I
lead
intucell’s
data
team.
About
two
years
ago
we

migrated
our
product
oﬀ
MySQL
and
started
working
with
MongoDB.

I
want
to
start
oﬀ
by
introducing
our
use
case
of
MongoDB:

We’ve
built
a
system
that
opJmizes
cellular
networks
automaJcally.
OpJmizing

cellular
networks
is
about
making
your
data
connecJon
faster
and
improve
the

quality
of
your
calls.

2

The
way
we
do
this
is
preOy
simple,
we
collect
a
lot
of
staJsJcs
about
what
goes
on

in
the
network,
like
how
many
calls
are
taking
place
or
how
many
users
are

connected
to
the
antenna.

We
then
analyze
this
informaJon
to
idenJfy
things
like
which
antennas
are
loaded.

Once
we
know
what
are
the
problems
in
the
network
we
act,
we
change
parameters

in
the
network
,
for
example,
we
would
force
your
phone
to
use
a
diﬀerent
antenna

so
you’ll
get
a
beOer
service.

Now,
as
you
see
this
process
is
cyclic,
we’ll
collect
more
staJsJcs
to
make
further

changes
and
make
sure
we
improved
the
network.
This
happens
all
the
Jme,
even

here
right
now,
with
AT&T.

In
the
process
of
working
with
MongoDB
we
learned
a
lot
about
database

performance
and
server
performance.
I
personally
spent
a
lot
of
Jme
monitoring
and

opJmizing
the
storage
and
memory
usage
which
brings
me
to
this
lecture.

3

Today
I’m
going
to
try
and
give
you
an
understanding
of
how
MongoDB
manages

memory.

So,
ﬁrst,
what
is
'memory
management'
when
it
comes
to
MongoDB?

Well,
memory
is
a
fast
but
limited
and
expensive
resource,
memory
management
is

about
deciding
what
data
to
save
in
memory.

4

Why
should
you
care
about
memory
management?

memory
management
has
a
huge
impact
on
performance
and
costs.

This
relates
both
to
developers
and
dbas,
as
a
developer
you
can
opJmize
the

schema
and
queries
for
beOer
memory
usage,

As
a
dba
you
can
monitor
and
predict
performance
issues
related
to
memory
usage.

I’m
preOy
sure
every
mongodb
administrator
asked
himself
atleast
once:
how
much

memory
do
I
really
need?.

Before
we
dive
in
I
want
to
tell
you
a
liOle
secret:
MongoDB
doesn’t
actually
manage

memory.
It
leaves
that
responsibility
to
the
operaJng
system.

5

Within
the
operaJng
system
there’s
a
stack
of
components
which
MongoDB
depends

on
to
manage
memory.

Each
component
relies
on
the
component
below
it.

(!)

This
talk
is
structured
around
this
stack
of
components.

We’ll
start
from
the
low
level
components
which
are
storage
devices:
disks
and
RAM

We’ll
conJnue
with
the
page
cache
and
memory
mapped
files
which
are
a
part
of
the

operaJng
system’s
kernel

And
we’ll
finish
off
with
MongoDB’s
usage
of
these
mechanisms.

(!)

Let’s
talk
about
storage.

6

There
are
diﬀerent
types
of
storage
devices
with
diﬀerent
characterisJcs,
we’ll

review
hard
disk
drives,
solid
state
drives
and
RAM.

Let’s
start
by
breaking
these
into
categories:
(!)
HDDs
and
SSDs
are
persistent
and

RAM
isn’t,
but
RAM
is
really
fast.

That’s
why
every
computer
has
both
types
of
storage,
one
persistent
(a
HDD
or
a

SSD)
and
one
is
volaJle
(RAM).

7

Now
let’s
compare
throughput.
As
I
said
before,
RAM
is
fast,
it
could
go
as
fast
as

6400
MBPS
for
reads
and
writes.

SSDs
are
10
Jmes
slower
than
RAM,
modern
SSDs
can
reach
a
read
rate
of
650
MBPS

and
a
liOle
less
for
writes.

HDDs
are
much
slower,
ranging
from
1
MB
to
160
MB
per
second
for
reads
and

writes.

The
reason
there’s
such
variance
in
HDD
speed
is
because
throughput
is
highly

aﬀected
by
access
paOerns.

Speciﬁcally
with
HDDs,
random
access
is
much
slower
than
sequenJal
access,
and

that’s
because
a
HDD
contains
a
mechanical
arm
that
needs
to
move
on
almost
every

random
access.

Sadly
for
us,
databases
do
a
lot
of
random
I/O.
which
means,
if
you’re
running
a

query
on
data
that’s
not
in
memory
and
therefore,
it
has
to
be
read
from
disk,
you’re

seeing
a
penalty
of
about
two
mulJtudes
on
response
Jmes.

The
next
characterisJc
is
price.
(!)

For
making
the
comparison
easier
we’ll
compare
the
price
per
GB.
It’s
not
surprising

that
there’s
a
correlaJon
between
price
and
throughput,
meaning,
the
more
you
pay

for
each
GB,
you
get
beOer
throughput.
So
hard
drives
are
really
cheap
at
5
cents
per

GB,
SSDs
are
10
Jmes
more
expensive
and
RAM
is
100
Jmes
more
expensive.

8

Is
this
informaJon
sufficient
to
choose
the
opJmal
hardware
configuraJon?
I
think

it’s
not,
your
applicaJon’s
requirements
are
also
a
part
of
the
equaJon.

For
example,
if
your
applicaJon
is
an
archive
that
saves
huge
amounts
of
data
that
is

rarely
accessed,
you
can
go
for
a
large
HDD
and
save
a
lot
of
money.

Later
on
we’ll
see
how
can
you
take
measurements
of
things
like
RAM
and
capacity

and
then
you’ll
be
able
to
determine
what
kind
of
hardware
configuraJon
you
need.

9

Now
lets
zoom
out
of
storage
and
and
move
up
to
the
next
layer
which
is
the
page

cache.

10

The
page
cache
is
a
part
of
the
operaJng
system’s
kernel
and
whenever
a
program

does
ﬁle
I/O
like
reads
and
writes
it
always
goes
through
the
page
cache.

The
page
cache
makes
reads
faster
by
saving
popular
chunks
of
data
in
memory
and

makes
writes
faster
by
lehng
the
applicaJon
write
to
memory
and
not
to
disk.

So
we
can
say
the
page
cache
was
invented
to
combine
the
disk’s
persistence
with

the
memory’s
speed.
It’s
about
having
the
best
of
both
worlds.

11

So..
It’s
called
the
page
cache
but
what
is
a
page?

A
page
is
a
4K
chunk
of
data.
Each
file
is
broken
into
pages.
The
number
of
pages

belong
to
a
file
is
simply
the
file’s
size
divided
by
4K.

(!)

Looking
at
the
example,
you
can
see
a
file
spanning
3
pages
because
it’s
10
kilobytes

in
size,
that
grey
area
is
an
unused
part
of
the
last
page
as
the
file’s
size
isn’t
a

mulJple
of
4
kilobytes.

The
page
cache’s
job
is
to
determine
which
pages
to
save
in
memory.

12

Lets
dive
a
liOle
deeper
and
see
what
happens
behind
the
scenes
when
we
read
from

a
file.

(!)

We
have
a
process
running
in
user
space
and
it’s
reading
100
bytes
from
a
file.

(!)

Through
a
system
call
we
get
to
the
kernel
where
the
page
cache
handles
the
read

request.

(!)

First,
the
page
cache
translates
the
posiJon
and
count
of
bytes
to
read
to
a
list
of

pages.
If
we
would
read
a
100
bytes
from
the
beginning
of
the
file,
the
result
of
this

step
would
be
the
first
page.

(!)

The
next
thing
the
page
cache
will
do
is
check
if
the
page
exists
in
the
cache,
(!)
if
it’s

not,
the
data
has
to
be
read
from
disk
and
then
it
will
be
stored
in
the
cache.

Once
the
page
is
in
the
cache
we
reach
the
last
step,
(!)
which
is
to
copy
the
data
to

the
user
space
applicaJon.

So
that’s
how
a
read
works.

13

The
page
cache
also
handles
writes.

(!)

This
Jme
our
process
is
calling
the
write
system
call.

(!)

The
page
cache
copies
the
data
from
the
process
to
the
relevant
pages
and
marks

them
as
dirty.
That’s
all
it
does,
change
data
in
memory.
It
gives
the
impression
the
data
has
been
wriOen,
where
in
fact
it
has
been
wriOen

only
to
memory
and
not
to
disk.
If
an
applicaJon
would
read
from
the
file
it
would

get
the
latest
the
data
from
memory
because
dirty
pages
must
stay
in
the
cache.

Having
dirty
pages
is
somewhat
dangerous
for
two
reasons:
first,
they
will
be
lost
if

the
operaJng
system
crashes.
Second,
if
there’s
a
lack
of
memory
they
can’t
be
freed.

The
soluJon
for
these
problems
is
to
flush
the
dirty
pages
to
the
disk.
(!)
There’s
a

thread
in
the
kernel
that
flushes
pages
aler
they
stay
in
the
cache
for
some
Jme
or

when
memory
needs
to
be
freed.

If
a
process
wants
to
make
sure
the
data
is
flushed
to
disk
it
can
call
the
fsync
system

call
that
can
trigger
a
flush
for
a
specific
file
or
even
the
enJre
file
system.

MongoDB
calls
that
every
30
seconds
to
make
sure
data
is
backed
by
disk.

14

I
menJoned
how
the
page
cache
frees
pages
when
memory
is
running
low,
this

procedure
is
called
page
reclamaJon.

There
are
diﬀerent
page
reclamaJon
policies.
A
page
reclamaJon
policy
is
an

algorithm
that
answers
a
simple
quesJon:
“what’s
the
next
page
that
can
be
freed?”

In
linux,
the
simple
answer
is:
“The
one
that
is
the
least
recently
used”.

Turns
out
page
reclamaJon
is
happening
all
the
Jme
even
on
healthy
systems,
it

doesn’t
mean
you’re
out
of
memory.

That’s
because
the
page
cache
is
greedy
and
will
try
to
use
all
the
free
memory
on

your
machine
to
cache
the
ﬁle
system.

In
order
to
understand
how
much
memory
is
used
by
the
page
cache
you
can
use
the

free
command.

15

Free
is
a
linux
program
that
displays
memory
usage
staJsJcs.
Lets
try
to
interpret
its

output.

When
running
free
with
–g
it
prints
units
in
GBs.
The
ﬁrst
line
reveals
the
total

amount
of
memory
which
is
64GB,
out
of
these
61GB
are
used
and
3GB
are
free.

Then,
out
of
the
61GB
that
are
used,
55GB
are
of
of
cached
data.
These
are
pages
in

the
page
cache.

The
second
line
interprets
the
cached
data
as
free
so
we
suddenly
have
only
5GB
of

used
memory.
This
is
memory
directly
allocated
by
programs.

The
reason
cached
memory
can
be
considered
free
is
because
even
though
the

memory
is
used
it
will
be
freed
if
programs
need
it.

As
soon
as
programs
allocate
memory
and
the
free
memory
runs
out
the
page
cache

shrinks
and
frees
pages.

16

The
next
component
up
the
stack
is
memory
mapped
ﬁles.

17

Memory
mapping
of
files
is
an
alternaJve
mechanism
for
reading
and
wriJng
from

files.
Instead
of
calling
the
read()
and
write()
system
calls,
a
process
can
map
a
part
of

file
into
memory
and
every
access
the
process
makes
to
memory
translates
to
a
file

read
or
write.

On
the
lel
you
can
see
a
process
with
a
memory
region
which
is
mapped
to
a

segment
of
a
file.

So
memory
addresses
100
to
200
are
mapped
to
a
file
segment
that
starts
at
400
and

ends
at
500.

A
write
to
memory
address
100
is
translated
to
a
write
to
the
file
at
address
400.

Mapping
a
file
into
memory
doesn’t
necessarily
load
its
data
into
memory,
if
a

process
reads
from
a
page
that
is
not
in
memory
the
infamous
page
fault
is
triggered.

The
code
in
the
kernel
that
handles
page
faults
tells
the
page
cache
to
load
the

required
pieces
of
data
from
disk
and
then
serves
the
read.

So
memory
mapping
has
several
advantages
over
regular
file
I/O:

First,
it’s
fast,
there’s
no
system
call
involved
and
no
copying
of
memory.
Reads
and

writes
access
memory
that
is
allocated
in
the
page
cache.

Second,
it
takes
the
responsibility
of
memory
management
from
the
user.
As
we’ve

seen
earlier,
the
page
cache
will
determine
what’s
actually
stored
in
memory.

18

In
this
example
two
processes
map
the
same
region
of
a
ﬁle
into
memory.
Only
one

copy
of
this
data
will
occupy
memory
or
even
less
if
it’s
not
accessed.

Historically
this
mechanism
was
invented
to
reduce
the
memory
usage
of
processes.

Whenever
you
execute
a
program,
the
program’s
code
and
it’s
shared
libraries
are

mapped
to
memory.

So
if
you
open
10
instances
of
chrome,
its
code
sJll
appears
once
in
memory.

19

Now
lets
see
how
Mongo
uses
this
stack
of
components

20

(!)

Mongo
maps
all
it’s
data
into
memory.
This
includes
the
documents,
the
indexes
and

the
journal.

(!)

When
running
top
you
can
actually
see
how
much
memory
is
mapped
and
how
much

is
used.

(!)

The
lel
column
called
VIRT
stands
for
virtual
memory,
once
a
process
maps
files
to

memory
they’re
accounted
under
virtual
memory.

When
using
journaling
mongo
actually
maps
the
data
files
twice,
so
this
figure
is

twice
the
amount
on
disk
which
is
about
273GB.

RES
stands
for
resident
memory
and
is
the
amount
of
memory
that’s
actually
located

in
RAM
out
the
virtual
memory.

SHR
stands
for
shared
resident
memory.
So
out
of
the
24GB
of
resident
memory,

23GB
is
data
from
memory
mapped
files
which
is
sharable.

21

Turns
out
this
very
cool
strategy
for
managing
memory
also
has
problems.
The

biggest
problem
is
MongoDB
(!)
has
no
control
of
what
is
saved
in
memory.
You
can’t

tell
mongo:
promise
me
this
document
or
collecJon
is
stored
in
memory
and
by
that

ensuring
fast
access.

Why
is
this
a
problem?
I’ll
give
you
some
examples:

1.  (!)
The
first
example
is
warm-‐up
–
aler
restarJng
your
server,
none
of
the
data
is

stored
in
memory,
for
every
page
that
is
accessed
for
the
first
Jme,
a
page
fault

will
be
triggered
and
the
query
will
take
longer.

2.  (!)
The
second
example
is
what
I
call
expensive
queries
–
expensive
queries
are

queries
that
aren’t
indexed
well
or
request
data
that
is
hardly
ever
accessed.

When
these
things
happen
documents
are
loaded
into
memory
at
the
cost
of

freeing
other
documents
who
are
more
important.
Why
does
this
happen?
As

we’ve
seen
before
the
page
cache
frees
the
least
recently
used
pages
first.

There
are
things
you
can
do
to
miJgate
this
problem.

22

What
we
did
is
(!)
protect
MongoDB
with
an
API.
The
API
enforces
index
usage
so

mongo
reads
less
documents
into
memory.
Another
thing
the
API
does
is
pass
a

query
Jmeout
to
make
sure
costly
queries
are
being
cancelled.

The
API
doesn’t
have
to
be
complicated,
it
could
be
a
simple
module
sihng
on
top
of

the
MongoDB
driver.

Lets
look
at
an
example,
(!)
this
is
(!)
a
python
funcJon
called
find_samples
and
it’s

used
whenever
we
want
to
run
a
find
query
on
the
collecJon
named
samples.

The
funcJon
accepts
two
parameters
that
define
a
date
range:
start_Jme
and

end_Jme.
By
forcing
the
user
to
pass
a
date
range
we
make
sure
the
query
is

indexed.
You
could
add
further
validaJons
to
make
sure
the
range
isn’t
too
big
or

doesn’t
go
too
far
back
in
history.

23

Another
challenge
worth
menJoning
is
(!)
the
lack
of
prioriJzaJon
between

processes.
When
processes
allocate
a
lot
of
memory
the
page
cache
shrinks

automaJcally,
and
since
mongo
relies
on
the
page
cache,
you
could
say
mongo’s

memory
shrinks
automaJcally.
In
other
words,
mongo
has
a
lower
priority
than
other

processes
over
memory.
Since
mongo
will
just
become
slower
if
it
doesn’t
have

enough
memory
you
need
to
be
careful
with
other
processes
running
on
the
same

server.

You
can
miJgate
this
phenomenon
by
isolaJng
mongo.
(!)
Don’t
run
it
on
the
same

server
along
with
memory
or
disk
intensive
applicaJons.

The
last
challenge
I’d
like
to
tackle
is
(!)
esJmaJng
how
much
memory
is
required,

also
known
as
the
size
of
the
working
set.

24

So
what
is
the
working
set?
this
is
the
data
that
your
applicaJon
reads
regularly
and

should
be
returned
in
a
Jmely
manner,
therefore
it
should
ﬁt
in
memory.

The
working
set
contains
(!)
more
than
documents,
it
also
includes
indexes
and
some

padding.

To
emphasize
the
padding
issue
lets
look
at
an
example
memory
page.

(!)

As
I
menJoned
before,
a
page’s
size
is
4k.

This
page
includes
3
documents,
between
the
documents
there’s
some
padding.
This

padding
accounts
for
expansion
of
exisJng
documents
or
inserJon
of
new
ones.

Out
of
the
three
documents,
only
document
number
2
is
accessed
regularly.

So
even
though
a
small
part
of
this
page
is
actually
used,
the
whole
page
is
saved
in

memory.
the
page
cache
can’t
save
half
pages
in
memory.

This
brings
us
to
the
conclusion
that
it’s
really
hard
to
measure
the
size
of
the

working
set
by
simply
looking
at
the
count
or
size
of
the
documents
being
queried.

SJll,
there
are
several
tools
to
help
you
esJmate
how
much
memory
a
collecJon

should
require.

25

The
tools
fall
into
two
categories:
planning
and
monitoring.

26

Planning
is
about
predicJng
how
much
memory
each
collecJon
is
going
to
need.

Lets
take
a
real
world
example.
In
one
of
our
collecJons
we
save
a
month
long
of

history,
out
of
that
month
we
know
our
applicaJon
olen
queries
the
last
two
weeks

and
someJmes
the
week
before
that.
The
last
two
weeks
are
considered
“hot
data”

because
they
have
to
be
stored
in
memory,
the
week
before
that
is
considered
warm,

it
doesn’t
have
to
be
in
memory
but
we
should
sJll
take
into
account
so
it
won’t
push

out
the
hot
data.

If
we’re
going
to
take
some
spares
to
compensate
for
padding
and
such,
it’s
safe
to

assume
3
out
of
the
4
weeks
should
ﬁt
in
memory.

(!)

You
can
use
the
collecJon
stats
command
to
get
important
metrics
like
the
size
of

indexes
and
the
size
of
the
data
and
roughly
calculate
how
much
memory
the

collecJon
is
going
to
require.

Once
you
have
a
running
database
you
can
use
several
monitoring
tools
to
analyze

the
working
set.

27

When
I
think
about
monitoring
tools
they
generally
fall
into
two
categories:
1.  (!)
One
is
online
monitoring
which
is
basically
seeing
what’s
going
on
at
the

moment.
This
category
includes
running
linux
commands
like
top
and
iostat
or

mongo
commands
like
currentOp,
mongostat
and
mongomem.

2.  (!)
The
second
category
is
offline
monitoring
which
is
more
about
collecJng
and

aggregaJng
historical
data.
One
example
would
be
the
profiling
collecJons
that

collects
slow
queries
over
Jme.
another
example
is
the
MMS
or
other
graphing

tools
like
graphite
that
collect
different
metrics
over
Jme.
these
are
used
for

idenJfying
trends,
correlaJons
and
predicJng
growth.

Lets
start
from
the
online
tools.

28

Mongomem
is
a
great
tool
for
memory
use
analysis.
It’s
wriOen
in
python
by
the

people
at
a
company
called
wish
so
you’ll
have
to
install
it
manually,
it
doesn’t
come

packaged
with
mongodb.

Mongomem
won’t
tell
you
how
much
memory
you
need
but
it
will
tell
you
how
much

memory
each
collecJon
is
using
at
the
moment.

Here’s
an
example
output,
(!)
each
line
shows
how
many
megabytes
of
the
collecJon

are
in
memory.
The
top
collecJon
in
this
example
is
the
oplog
with
more
then
11GB

of
data
in
memory
out
of
almost
50GB
of
data.
So
about
22%
of
the
collecJon
is
in

memory.

The
last
line
shows
the
total
amount
of
memory
used
by
mongo
out
of
the
total
data

size,
so
in
this
example
we
have
16GB
of
data
in
memory
out
of
280GB
of
total
data.

Since
I’ve
got
16GB
of
memory
on
this
machine,
we
can
see
all
the
memory
is
being

used.

But
what
does
this
say
about
the
working
set?
Is
it
larger
than
memory?
In
other

words,
do
we
have
enough
memory?

Well,
we
can’t
say,
because
it’s
possible
there’s
data
in
memory
that
is
hardly
ever

accessed..
The
page
cache
just
didn’t
have
to
reclaim
these
pages.

29

What
you
can
do
in
order
to
test
how
much
RAM
mongo
actually
uses
is
the
following

procedure:

1.  First
thing
you
have
to
do
is
stop
the
database

2.  Then,
you
need
to
clear
the
page
cache,
the
following
command
invokes
some

code
in
the
kernel
that
drops
all
pages
from
memory.

3.  The
next
step
is
to
start
the
database

4.  And
aler
that
you
need
to
invoke
the
queries
that
should
cover
your
working
set.

Queries
that
should
access
all
the
data
you
expect
to
have
in
memory.

5.  At
this
point,
when
running
mongomem
you’ll
be
able
to
get
a
more
accurate

picture
of
how
much
memory
is
required.

30

Before
looking
at
addiJonal
tools
I
want
to
answer
a
simple
quesJon:
how
do
we

know
when
something
is
wrong?
what
do
we
need
to
monitor?

And
since
we’re
talking
about
memory,
how
do
we
know
we
don’t
have
enough
of

it?.

Well,
the
phenomenon
of
not
having
enough
memory
is
called
thrashing.

When
the
OS
is
thrashing,
it’s
because
an
applicaJon
is
constantly
accessing
pages

that
are
not
in
memory,
the
OS
is
busy
handling
the
pagefaults,
reading
the
pages

from
disk.

So
the
ﬁrst
thing
to
monitor
is
page
faults
(!),
and
since
it’s
hard
to
tell
how
many

page
faults
are
too
much,
you
should
also
look
at
disk
uJlizaJon,
if
the
disk
is
uJlized

100%
of
the
Jme,
you’re
in
trouble.

There
are
a
lot
of
other
things
that
go
wrong
like
(!)
a
lot
of
queries
being
queued
and

high
locking
raJos
but
these
just
are
symptoms

31

I
usually
use
iostat
for
looking
at
disk
utlizaJon.

Here’s
an
example
output
of
the
command,
the
rightmost
column
shows
this
disk

uJlizaJon
and
reveals
a
disk
that
is
busy
a
100%
of
the
Jme.

The
second
column
show
the
disk
serves
570
reads
per
second
and
the
third
column

shows
the
number
of
writes
per
second
which
is
zero.

If
this
is
happening
constantly,
the
working
set
does
not
ﬁt
in
memory.

Along
with
iostat,
I
frequently
use
mongostat

32

Mongostat
comes
packaged
with
MongoDB
and
uses
the
underlying
(!)
serverStatus

command.
It
displays
a
bunch
of
interesJng
metrics
like
(!)
the
number
of
page
faults

and
queued
reads.

It’s
preOy
hard
to
say
how
many
page
faults
are
too
much
but
more
than
one
or
two

hundred
page
faults
per
second
are
an
indicaJon
of
a
lot
of
data
being
read
from

disk.
If
this
happens
over
long
periods
of
Jme
it
could
be
an
indicaJon
the
working

set
does
not
fit
in
RAM.

If
the
number
of
queued
reads
is
larger
than
a
hundred
over
long
periods
of
Jme
it

could
also
be
an
indicaJon
the
working
set
doesn’t
fit
in
RAM.

It’s
olen
important
to
look
at
these
parameters
over
Jme
in
order
to
determine
if

there’s
a
sudden
spike
or
repeaJng
problem.
This
brings
me
to
offline
monitoring.

33

Tools
like
the
(!)
MMS
or
graphite
can
show
you
these
important
metrics
over
Jme.

Using
one
of
these
tools
is
(!)
mandatory
for
a
producJon
system.
I
cannot
tell
you

how
useful
they
are.

Whenever
we
get
a
Jcket
about
a
performance
problem
we
put
our
Sherlock
hats
on

and
start
an
invesJgaJon.

We
look
at
metrics
related
to
our
applicaJon
but
also,
a
lot
of
metrics
related
to

mongo
and
how
they
change
over
Jme:
we
look
at
the
number
of
queries,
the

number
of
documents
in
collecJons
and
tens
of
other
metrics.

I’d
like
to
show
you
an
example
workﬂow
of
a
Jcket.

Try
to
picture
this:
it
was
a
quiet
evening,
I
was
about
to
go
to
sleep,
when
I
get
an

automated
email
that
one
of
our
shards
is
misbehaving,
what
were
the
symptoms?
it

had
more
than
300
queries
just
waiJng
in
queue.
What
do
I
do
next?

34

I
immediately
open
graphite,
this
is
a
screenshot
of
the
number
of
page
faults
in

green
and
the
number
of
queued
readers
in
blue.
By
looking
at
the
history
you
can

spot
two
trends:

1.
First,
there’s
a
spike
of
high
load
every
hour.
This
is
actually
normal
since
we’re

doing
hourly
aggregaJons
of
our
data.

2.
The
second
trend,
is
a
massive
rise
in
page
faults
and
queued
queries
at
exactly

20:00.
At
this
point
there’s
an
impact
on
users
as
a
lot
of
queries
take
a
very
long

Jme.

Why
is
this
happening?
Has
the
working
set
outgrown
memory?

35

Lets
look
at
another
screenshot
of
the
same
Jme
frame.
This
Jme
we
look
at
other

metrics:
in
blue
are
the
numbers
of
queries,
in
green
are
the
number
of
updates
and

in
red
is
the
disk
uJlizaJon.

Remember
that
disk
uJlizaJon
is
measured
in
percentage
so
even
though
the
graph

is
lower
than
others
we
can
sJll
see
that
at
20:00
the
disk
was
constantly
uJlized
at
a

100%.

When
looking
at
the
updates
vs.
queries
it’s
obvious
that
a
huge
amount
of
updates
is

hurJng
the
query
performance.
We
were
busy
wriJng
to
disk.

In
this
case
an
applicaJon
change
was
the
root
cause
of
the
problem,
the
applicaJon

simply
started
updaJng
a
lot
more
documents.

So
using
graphite,
we
were
able
to
trace
the
problem
to
a
speciﬁc
change
in
our

applicaJon
and
later
on
modiﬁed
our
schema
to
reduce
the
document
size
and
the

load
on
disk.

This
brings
me
to
next
topic
which
is
opJmizaJon.

36

When
opJmizing
memory
usage
the
main
target
is
to
reduce
the
amount
of
required

memory
for
your
applicaJon.

(!)
Smaller
the
collecJons
and
documents
are,
the
faster
the
queries
will
be.
not
just

in
terms
of
memory
but
also
disk,
if
documents
are
smaller
less
disk
access
is

required
to
read
them.

There
are
several
opJmizaJons
you
can
do
when
it
comes
to
schema:
(!)

1.  first,
shorten
the
keys.
we’ve
started
with
long
names
like
firstName,
then,

shortened
them
to
a
single
word
or
acronym
and
finally
used
one
or
two
leOers

since
it
had
a
huge
impact
on
the
size
of
our
data.
By
shortening
the
keys
we

reduced
the
size
of
our
data
in
more
than
50%.
There
is
a
huge
downside
for

doing
this
because
it
obscures
the
data
but
fortunately,
we
have
an
API
that
hides

this
ugly
implementaJon
detail
so
it
doesn’t
have
an
impact
on
our
users.

2.  Another
thing
to
consider
is
the
tradeoff
between
the
number
of
documents
and

their
size,
in
many
use
cases
it’s
more
efficient
to
store
a
smaller
amount
of
large

documents
vs.
a
large
amount
of
small
ones.

We
previously
seen
how
padding
occupies
memory,
by
changing
the
padding
factor

and
running
repair
every
some
Jme
you
can
reduce
the
padding
overhead.

The
next
thing
you
can
opJmize
is
indices

37

First
thing
you
should
know
is
that
unused
indices
are
sJll
accessed
whenever

documents
are
being
inserted,
updated
or
deleted.
Try
to
idenJfy
those
and
remove

them.

(!)
Use
sparse
indices
when
only
some
of
the
documents
will
have
the
indexed

aOribute
as
they
use
less
space.

(!)
The
last
thing
I
want
to
talk
about
is
how
much
of
the
index
is
located
in
memory.

The
answer
is:
it
depends.

If
the
enJre
index
is
accessed
by
queries
then
the
enJre
index
should
be
located
in

memory.
If
only
a
single
part
of
the
index
is
used,
only
that
part
has
to
fit
in
memory.

Lets
look
at
a
few
examples
to
emphasize
the
difference,
you
can
imagine
an
index
(!)

as
a
segment
of
memory,

the
red
marks
are
locaJons
frequently
accessed
by

queries.

(!)
The
first
example
is
an
index
on
a
date
field
called
creaJon_Jme.
Each
inserted

document
inserts
the
largest
value
of
all
previous
ones
so
the
right
most
part
of
the

index
is
updated.

In
many
such
indexes
only
the
recent
history
is
olen
accessed
so
only
the
right-‐most

part
of
the
index
will
be
located
in
memory.

(!)
The
second
example
is
an
index
on
a
person’s
name,
the
index
accesses
will

probably
distribute
evenly
across
the
enJre
index
so
most
of
it
will
be
located
in

memory.

38

So
lets
summarize
what
we’ve
learned:

1.
We’ve
seen
how
memory
management
works,
we’ve
started
from
the
disk
and

RAM,
went
up
the
stack
to
the
page
cache
whose
sole
purpose
is
to
improve
read
and

write
performance
by
using
the
memory.
We
conJnued
to
memory
mapped
files

which
translate
memory
accesses
like
reads
and
writes
to
file
reads
and
writes.
And

we
finished
with
MongoDB’s
usage
of
these
mechanisms.

2.
We’ve
talked
about
the
challenges
this
strategy
presents:
like
predicJng
and

measuring
the
size
of
the
working
set.

3.
We
then
talked
about
monitoring,
which
is
something
you
have
to
do
if
you
have
a

DB
running
in
producJon.

4.
We
finished
with
schema
and
index
opJmizaJons
which
are
crucial
for
cuhng

costs
and
improving
performance.

39

And
that’s
it!
I
hope
you
enjoyed
my
talk
and
thanks
for
having
me.

40

MongoDB memory management demystified

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à MongoDB memory management demystified

Similaire à MongoDB memory management demystified (20)

Dernier

Dernier (20)

MongoDB memory management demystified