15. Moji
• A file-like MogileFS client for Java developers
• Production-ready features
– Connection pooling, load balancing, fault-tolerant…
• Quality
– Spring friendly, integration tests, well documented,actively
developing…
https://github.com/mogilefs-moji/moji
16. Configuration
• Using plain-old-Java
• Using the Spring framework
SpringMojiBean moji = new SpringMojiBean();
moji.setAddressesCsv("192.168.0.1:7001,192.168.0.2:7001");
moji.setDomain("testdomain");
moji.initialise();
moji.setTestOnBorrow(true);
moji.tracker.address=192.168.0.1:7001,192.168.0.2:7001
moji.domain=testdomain
<import resource="moji-context.xml" />
25. Multiple Sites
• Given a network of: 10.10.0.0/16
• All of your machines are configured to have a netmask of
10.10.0.0/16 . When assigning IP addresses to machines, pick them
from 10.10.5.0/24
• 設定IP
– web1: 10.10.5.1 (netmask 255.255.0.0 or /16)
– web2: 10.10.5.2
– tracker1: 10.10.5.3
– tracker2: 10.10.5.4
– storage node 1: 10.10.5.5
– storage node 2: 10.10.5.6
– storage node 3: 10.10.8.1
• MogileFS zones, you configure:
– near=10.10.5.0/24 far=10.10.8.0/24
web1
tracker1
node1 node2
near
tracker2
node3
far
web2
26. Scrubber
• Make use of routine FSCK as scrubber
• Modified Algorithm
– Remove exhaustive search
– Improve performance in large scale
https://github.com/mogilefs/MogileFS-
Network/blob/master/lib/MogileFS/ReplicationPolicy/HostsPerNetwork.pm#L84
mogadm fsck status |grep " Yes " ||
(mogadm fsck reset; mogadm fsck clearlog; mogadm fsck start)
>/var/log/mogadm.fsck 2>&1
27. Modern durable write
• AS-IS
client
tracker
store
mysql
store store
tracker
tracker
4. Write other copies asynchronously
Assume that a file should have at least three replicas
in the system to fit the durability requirement
28. Modern durable write
client
tracker
store
mysql
2. Write at least two copies
before ACK
store store
tracker
tracker
4. Write other copies
asynchronously
• TO-BE
Assume that a file should have at least three replicas
in the system to fit the durability requirement
mogilefs-moji#25
mogilefs/MogileFS-Server#39
30. Analysis
• Combinatorial analysis model
– Assume that each disk fails independently
– Assume that after x hours of operation each block
has P(xi) = p
– Probability of failure q = 1 - p.
– 對replication來說是一個naive的公式:1 – qn
31. Analysis
• 若考慮
– Non-Recoverable Errors (NREs)
– drive failure events are poisson
– site failures (e.g. due to regional disasters)
– rep latency, mark-out time
– …
• Analysis of system durability is commonly
done with Markov models
32. Analysis
• Example of durable write
– Assume mean disk life is 500K hrs
– 2 replicas, no NRE
249960
249980
250000
250020
250040
250060
250080
1 0.041666667 0.020833333 0.013888889
diff disk life 5
diff disk life 5
Diff of MTTDL in hr
mu
複製速率越低, durable
write的改善幅度越大
33. Analysis
• Example of probability of data loss
0.000000E+00
1.000000E-05
2.000000E-05
3.000000E-05
4.000000E-05
5.000000E-05
6.000000E-05
7.000000E-05
8.000000E-05
1 2 3 4 5 6 7 8 9 10 11 12 13 14
P of data loss 72
P of data loss 48
P of data loss 24
P of data loss 1