SlideShare une entreprise Scribd logo
1  sur  157
Télécharger pour lire hors ligne
Vectorized VByte
Decoding
Jeff Plaisance, Nathan Kurz, Daniel Lemire
Inverted Indexes
Inverted Index
● Like index in the back of a book
● words = terms, page numbers = doc ids
● Term list is sorted
● Doc list for each term is sorted
doc id query country impressions clicks
0 software Canada 10 1
1 blank Canada 10 0
2 sales US 5 0
3 software US 8 1
4 blank US 10 1
Standard Index
Constructing an Inverted Index
query country impression clicks
doc id blank sales software Canada US 5 8 10 0 1
0 ✔ ✔ ✔ ✔
1 ✔ ✔ ✔ ✔
2 ✔ ✔ ✔ ✔
3 ✔ ✔ ✔ ✔
4 ✔ ✔ ✔ ✔
Constructing an Inverted Index
field term 0 1 2 3 4
query blank ✔ ✔
sales ✔
software ✔ ✔
country Canada ✔ ✔
US ✔ ✔ ✔
impressions 5 ✔
8 ✔
10 ✔ ✔ ✔
clicks 0 ✔ ✔
1 ✔ ✔ ✔
Inverted Index
field term doc list
query blank 1, 4
sales 2
software 0, 3
country Canada 0, 1
US 2, 3, 4
impressions 5 2
8 3
10 0, 1, 4
clicks 0 1, 2
1 0, 3, 4
Inverted Indexes
Allow you to:
● Quickly find all documents containing
a term
● Intersect several terms to perform
boolean queries
Inverted Index Optimizations
● Compressed data structures
○ Better use of RAM and processor cache
○ Better use of memory bandwidth
○ Increased CPU usage and time
● Optimizations matter!
Delta / VByte Encoding
● Doc id lists are sorted
● Delta between a doc id and the previous doc
id is sufficient
● Deltas are usually small integers
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
34, 52, 161, 54, 373, 40
Small Integer Compression
● Golomb/Rice
● VByte (or Varint)
● Binary Packing
● PForDelta
Small Integer Compression
● Golomb/Rice
● VByte
● Bit Packing
● PForDelta
VByte Encoding
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9838
? 1 1 0 1 1 1 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
VByte
Pros:
● Compression
● Can fit more of index in RAM
● Higher information throughput per byte read
from disk
VByte
Cons:
● Decodes one byte at a time
● Lots of branch mispredictions
● Not fast to decode
● Largest ints expand to 5 bytes
Vectorized VByte Decoding
Optimized decoder implemented using x86_64
intrinsics
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
pmovmskb: Extract top bit of each byte
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
pmovmskb: Extract top bit of each byte
010010100111
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of longest varint in bytes
● number of bytes to consume
010010100111
Decoding options for:
● sixteen 1 byte varints
● six 1-2 byte varints
● four 1-3 byte varints
● two 1-5 byte varints
010010100111
Decoding options for:
● sixteen 1 byte varints - special case
● six 1-2 byte varints - 2^6, 64 possibilities
● four 1-3 byte varints - 3^4, 81 possibilities
● two 1-5 byte varints - 5^2, 25 possibilities
170 total possibilities
010010100111
Data Distribution:
● Longer doc id lists are necessarily
composed of smaller deltas
● Most deltas in real datasets (ClueWeb09,
Indeed’s internal datasets) fall into 1 byte
case or 1-2 byte case
Most Significant Bit Decoding
● We separate most significant bit decoding
from integer decoding
● Reduces duplicate most significant bit
decoding work if we don’t consume all 12
bytes
● Better instruction level parallelism
010010100111
● If most significant bits of next 16 bytes are all
0, handle sixteen 1 byte ints case
● Otherwise lookup most significant bits of
next 12 bytes in 4096 entry lookup table
010010100111
Lookup table contains:
● Shuffle vector index from 0-169 representing
which possibility we are decoding
● Number of bytes of input that will be
consumed
010010100111
Branch on shuffle vector index to determine
which case we are decoding
● 0-63 - six 1-2 byte ints
● 64-144 - four 1-3 byte ints
● 145-169 - two 1-5 byte ints
Six 1-2 Byte Ints
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Decode 6 varints from 9 bytes
Expected Positions
2 1 2 1 2 1 2 1 2 1 2 1 0 0 0 0
0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1
1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2
Six 1-2 byte ints
Four 1-3 byte ints
Two 1-5 byte ints
Six 1-2 Byte Ints
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Pad out 1 byte ints to 2 bytes
Six 1-2 Byte Ints
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000000
Pad out 1 byte ints to 2 bytes
Shuffle input
● Use index to lookup appropriate shuffle
vector
● Shuffle input bytes to get them in the
expected positions
for (i = 0; i < 16; i++) {
if (mask[i] & 0x80) {
dest[i] = 0;
} else {
dest[i] = src[mask[i] & 0xF];
}
}
pshufb
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0 43
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0 43 7C DE 45 27 60 7C E3 43 A9
src
mask
dest
Shuffle input
● Use index to lookup appropriate shuffle
vector
● Shuffle input bytes to get them in the
expected positions
Six 1-2 Byte Ints
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000000
Reverse bytes in 2 byte varints
*not actually necessary since x86 is little endian
Six 1-2 Byte Ints
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110110
Reverse bytes in 2 byte varints
*not actually necessary since x86 is little endian
Six 1-2 Byte Ints
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110110
Mask out leading purple 1’s
Six 1-2 Byte Ints
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110110
Mask out leading purple 1’s
Six 1-2 Byte Ints
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110110
Shift top bytes of each varint 1 bit right
(mask/shift/or)
Six 1-2 Byte Ints
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
Shift top bytes of each varint 1 bit right
(mask/shift/or)
Six 1-2 Byte Ints
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
Done!
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
101101000110
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
Decode 4 varints from 8 bytes
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 01110110
Pad ints to 4 bytes
Four 1-3 Byte Ints
11101110 00011101 00000000 00000000
11110101 11101101 01111001 00000000
11111000 01101001 00000000 00000000
00100001 00000000 00000000 00000000
Pad ints to 4 bytes
Four 1-3 Byte Ints
00000000 00000000 00011101 11101110
00000000 01111001 11101101 11110101
00000000 00000000 01101001 11111000
00000000 00000000 00000000 00100001
Reverse bytes
*not actually necessary since x86 is little endian
Four 1-3 Byte Ints
00000000 00000000 00011101 11101110
00000000 01111001 11101101 11110101
00000000 00000000 01101001 11111000
00000000 00000000 00000000 00100001
Clear top bit of each byte
Four 1-3 Byte Ints
00000000 00000000 00011101 01101110
00000000 01111001 01101101 01110101
00000000 00000000 01101001 01111000
00000000 00000000 00000000 00100001
Clear top bit of each byte
Four 1-3 Byte Ints
00000000 00000000 00011101 01101110
00000000 01111001 01101101 01110101
00000000 00000000 01101001 01111000
00000000 00000000 00000000 00100001
Shift 2nd least significant bytes over by 1 bit
(mask/shift/or)
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 01111001 00110110 11110101
00000000 00000000 00110100 11111000
00000000 00000000 00000000 00100001
Shift 2nd least significant bytes over by 1 bit
(mask/shift/or)
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 01111001 00110110 11110101
00000000 00000000 00110100 11111000
00000000 00000000 00000000 00100001
Shift 3rd least significant bytes over by 2 bits
(mask/shift/or)
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 00011110 01110110 11110101
00000000 00000000 00110100 11111000
00000000 00000000 00000000 00100001
Shift 3rd least significant bytes over by 2 bits
(mask/shift/or)
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 00011110 01110110 11110101
00000000 00000000 00110100 11111000
00000000 00000000 00000000 00100001
Done!
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
111101110110
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
111101110110
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
111101110110
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
111101110110
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
Decode 2 varints from 9 bytes
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
● Could handle the same way as other cases
● Would require 5 AND operations, 4 shift
operations, and 4 OR operations
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
● We can simulate shifting by different
amounts with multiplication
● Only needs 1 multiplication, 1 shift, 1 OR, 1
shuffle
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2
Two 1-5 byte ints
Treat SIMD register as eight 16 bit registers, loading 1 byte into
each. First byte doesn’t need to be shifted.
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
11101110 00000000 00000000 00000000
00000000 00000000 00000000 00000000
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
11101110 00000000 00000000 00000000
00000000 00000000 00000000 10011101
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
11101110 00000000 00000000 00000000
00000000 11110101 00000000 10011101
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
11101110 00000000 00000000 11101101
00000000 11110101 00000000 10011101
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110110
11101110 00000011 00000000 11101101
00000000 11110101 00000000 10011101
Two 1-5 Byte Ints
11101110 00000011
00000000 11101101
00000000 11110101
00000000 10011101
Clear top bit of each byte
Two 1-5 Byte Ints
01101110 00000011
00000000 01101101
00000000 01110101
00000000 00011101
Clear top bit of each byte
Two 1-5 Byte Ints
01101110 00000011 * 16 (<< 4)
00000000 01101101 * 32 (<< 5)
00000000 01110101 * 64 (<< 6)
00000000 00011101 * 128 (<< 7)
Multiply to shift bits into place
Two 1-5 Byte Ints
01101110 00000011 * 16 (<< 4)
00000000 01101101 * 32 (<< 5)
00000000 01110101 * 64 (<< 6)
00000000 00011101 * 128 (<< 7)
Multiply to shift bits into place
Two 1-5 Byte Ints
11100000 00110000 * 16 (<< 4)
00001101 10100000 * 32 (<< 5)
00011101 01000000 * 64 (<< 6)
00001110 10000000 * 128 (<< 7)
Multiply to shift bits into place
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Left shift everything by 8 bits
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Left shift everything by 8 bits
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Bitwise OR pre-shifted and shifted registers
00110000 00001101 10100000 00011101
01000000 00001110 10000000 00000000
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other byte
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
OR in low 7 bits of least significant byte
(remember that we stored it in most significant
byte position originally)
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 11101110
OR in low 7 bits of least significant byte
(remember that we stored it in most significant
byte position originally)
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 10011101 11110101 11101101
00000011
Results
Results
Q&A

Contenu connexe

Tendances

TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話Kazuho Oku
 
ダークネットのはなし #ssmjp
ダークネットのはなし #ssmjpダークネットのはなし #ssmjp
ダークネットのはなし #ssmjpsonickun
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
Vivliostyle.js における CSS Paged Media の実装
Vivliostyle.js における CSS Paged Media の実装Vivliostyle.js における CSS Paged Media の実装
Vivliostyle.js における CSS Paged Media の実装Shinyu Murakami
 
Documentação da infraestrutura de rede
Documentação da infraestrutura de redeDocumentação da infraestrutura de rede
Documentação da infraestrutura de redeMarcos Monteiro
 
IT Essentials (Version 7.0) - ITE Chapter 13 Exam Answers
IT Essentials (Version 7.0) - ITE Chapter 13 Exam AnswersIT Essentials (Version 7.0) - ITE Chapter 13 Exam Answers
IT Essentials (Version 7.0) - ITE Chapter 13 Exam AnswersITExamAnswers.net
 
U-NEXT事例発表-レコメンドシステムのこれまでとこれから
U-NEXT事例発表-レコメンドシステムのこれまでとこれからU-NEXT事例発表-レコメンドシステムのこれまでとこれから
U-NEXT事例発表-レコメンドシステムのこれまでとこれからTakatoshi Kakimoto
 
SDN and Mininet: Some Basic Concepts
SDN and Mininet: Some Basic ConceptsSDN and Mininet: Some Basic Concepts
SDN and Mininet: Some Basic ConceptsEswar Publications
 
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stmInterop Tokyo ShowNet NOC Team
 
詳説データベース輪読会: 分散合意その2
詳説データベース輪読会: 分散合意その2詳説データベース輪読会: 分散合意その2
詳説データベース輪読会: 分散合意その2Sho Nakazono
 
SD-WAN導入の現場でみえてきたアレコレ
SD-WAN導入の現場でみえてきたアレコレSD-WAN導入の現場でみえてきたアレコレ
SD-WAN導入の現場でみえてきたアレコレKoji Komatsu
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
「速効サプリ」情報処理安全確保支援士塾 #4
「速効サプリ」情報処理安全確保支援士塾 #4「速効サプリ」情報処理安全確保支援士塾 #4
「速効サプリ」情報処理安全確保支援士塾 #4Naoki MURAYAMA
 
さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)Takanori Sejima
 
シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視Kazuhiro Nishiyama
 

Tendances (20)

TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話
 
ダークネットのはなし #ssmjp
ダークネットのはなし #ssmjpダークネットのはなし #ssmjp
ダークネットのはなし #ssmjp
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
47 ĐỀ ÔN THI HỌC KÌ 1 - TOÁN LỚP 2
47 ĐỀ ÔN THI HỌC KÌ 1 - TOÁN LỚP 247 ĐỀ ÔN THI HỌC KÌ 1 - TOÁN LỚP 2
47 ĐỀ ÔN THI HỌC KÌ 1 - TOÁN LỚP 2
 
Vivliostyle.js における CSS Paged Media の実装
Vivliostyle.js における CSS Paged Media の実装Vivliostyle.js における CSS Paged Media の実装
Vivliostyle.js における CSS Paged Media の実装
 
Documentação da infraestrutura de rede
Documentação da infraestrutura de redeDocumentação da infraestrutura de rede
Documentação da infraestrutura de rede
 
IT Essentials (Version 7.0) - ITE Chapter 13 Exam Answers
IT Essentials (Version 7.0) - ITE Chapter 13 Exam AnswersIT Essentials (Version 7.0) - ITE Chapter 13 Exam Answers
IT Essentials (Version 7.0) - ITE Chapter 13 Exam Answers
 
U-NEXT事例発表-レコメンドシステムのこれまでとこれから
U-NEXT事例発表-レコメンドシステムのこれまでとこれからU-NEXT事例発表-レコメンドシステムのこれまでとこれから
U-NEXT事例発表-レコメンドシステムのこれまでとこれから
 
SDN and Mininet: Some Basic Concepts
SDN and Mininet: Some Basic ConceptsSDN and Mininet: Some Basic Concepts
SDN and Mininet: Some Basic Concepts
 
Apache NiFi の紹介 #streamctjp
Apache NiFi の紹介  #streamctjpApache NiFi の紹介  #streamctjp
Apache NiFi の紹介 #streamctjp
 
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm
2022のShowNetに向けて_ShowNet2021_conf_mini_5_2022_stm
 
Virtual Chassis Fabric for Cloud Builder
Virtual Chassis Fabric for Cloud BuilderVirtual Chassis Fabric for Cloud Builder
Virtual Chassis Fabric for Cloud Builder
 
詳説データベース輪読会: 分散合意その2
詳説データベース輪読会: 分散合意その2詳説データベース輪読会: 分散合意その2
詳説データベース輪読会: 分散合意その2
 
SD-WAN導入の現場でみえてきたアレコレ
SD-WAN導入の現場でみえてきたアレコレSD-WAN導入の現場でみえてきたアレコレ
SD-WAN導入の現場でみえてきたアレコレ
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
「速効サプリ」情報処理安全確保支援士塾 #4
「速効サプリ」情報処理安全確保支援士塾 #4「速効サプリ」情報処理安全確保支援士塾 #4
「速効サプリ」情報処理安全確保支援士塾 #4
 
41 bài toán tính tuổi - tính ngày có đáp án
41 bài toán tính tuổi - tính ngày có đáp án41 bài toán tính tuổi - tính ngày có đáp án
41 bài toán tính tuổi - tính ngày có đáp án
 
さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)
 
IIJmio meeting 23 DNSフィルタリングをなぜ行うのか
IIJmio meeting 23 DNSフィルタリングをなぜ行うのかIIJmio meeting 23 DNSフィルタリングをなぜ行うのか
IIJmio meeting 23 DNSフィルタリングをなぜ行うのか
 
シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視
 

Similaire à MaskedVByte: SIMD-accelerated VByte

How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...dws1d
 
Guia de configuracao_ms9520
Guia de configuracao_ms9520Guia de configuracao_ms9520
Guia de configuracao_ms9520plisleo
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeWang Zuo
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Tensorflow and python : fault detection system - PyCon Taiwan 2017
Tensorflow and python : fault detection system - PyCon Taiwan 2017Tensorflow and python : fault detection system - PyCon Taiwan 2017
Tensorflow and python : fault detection system - PyCon Taiwan 2017Eric Ahn
 
4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primerpvmistary
 
4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primerpvmistary
 
Number System | Types of Number System | Binary Number System | Octal Number ...
Number System | Types of Number System | Binary Number System | Octal Number ...Number System | Types of Number System | Binary Number System | Octal Number ...
Number System | Types of Number System | Binary Number System | Octal Number ...Get & Spread Knowledge
 
Displaying Animated Images on GLCD display with LPC2148 Microcontroller
Displaying Animated Images on GLCD display with LPC2148 MicrocontrollerDisplaying Animated Images on GLCD display with LPC2148 Microcontroller
Displaying Animated Images on GLCD display with LPC2148 MicrocontrollerOmkar Rane
 
Monetdb basic bat operation
Monetdb basic bat operationMonetdb basic bat operation
Monetdb basic bat operationChen Wang
 
Block Cipher.cryptography_miu_year5.pptx
Block Cipher.cryptography_miu_year5.pptxBlock Cipher.cryptography_miu_year5.pptx
Block Cipher.cryptography_miu_year5.pptxHodaAhmedBekhitAhmed
 
2011 DDR4 Mini Workshop.pdf
2011 DDR4 Mini Workshop.pdf2011 DDR4 Mini Workshop.pdf
2011 DDR4 Mini Workshop.pdfssuser2a2430
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)Amir Masinaei
 
16 AVR Instruction Encoding.pdf
16 AVR Instruction Encoding.pdf16 AVR Instruction Encoding.pdf
16 AVR Instruction Encoding.pdfHuzaifa747109
 

Similaire à MaskedVByte: SIMD-accelerated VByte (20)

IPv6 Basics
IPv6 BasicsIPv6 Basics
IPv6 Basics
 
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
 
Guia de configuracao_ms9520
Guia de configuracao_ms9520Guia de configuracao_ms9520
Guia de configuracao_ms9520
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to come
 
Inside Winnyp
Inside WinnypInside Winnyp
Inside Winnyp
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Tensorflow and python : fault detection system - PyCon Taiwan 2017
Tensorflow and python : fault detection system - PyCon Taiwan 2017Tensorflow and python : fault detection system - PyCon Taiwan 2017
Tensorflow and python : fault detection system - PyCon Taiwan 2017
 
4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer
 
4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer4 bit lcd_interfacing_with_arm7_primer
4 bit lcd_interfacing_with_arm7_primer
 
Number System | Types of Number System | Binary Number System | Octal Number ...
Number System | Types of Number System | Binary Number System | Octal Number ...Number System | Types of Number System | Binary Number System | Octal Number ...
Number System | Types of Number System | Binary Number System | Octal Number ...
 
Displaying Animated Images on GLCD display with LPC2148 Microcontroller
Displaying Animated Images on GLCD display with LPC2148 MicrocontrollerDisplaying Animated Images on GLCD display with LPC2148 Microcontroller
Displaying Animated Images on GLCD display with LPC2148 Microcontroller
 
Monetdb basic bat operation
Monetdb basic bat operationMonetdb basic bat operation
Monetdb basic bat operation
 
Block Cipher.cryptography_miu_year5.pptx
Block Cipher.cryptography_miu_year5.pptxBlock Cipher.cryptography_miu_year5.pptx
Block Cipher.cryptography_miu_year5.pptx
 
2011 DDR4 Mini Workshop.pdf
2011 DDR4 Mini Workshop.pdf2011 DDR4 Mini Workshop.pdf
2011 DDR4 Mini Workshop.pdf
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)
 
crack satellite
crack satellite crack satellite
crack satellite
 
Data compession
Data compession Data compession
Data compession
 
16 AVR Instruction Encoding.pdf
16 AVR Instruction Encoding.pdf16 AVR Instruction Encoding.pdf
16 AVR Instruction Encoding.pdf
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 

Plus de Daniel Lemire

Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksDaniel Lemire
 
Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Daniel Lemire
 
Parsing JSON Really Quickly: Lessons Learned
Parsing JSON Really Quickly: Lessons LearnedParsing JSON Really Quickly: Lessons Learned
Parsing JSON Really Quickly: Lessons LearnedDaniel Lemire
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Ingénierie de la performance au sein des mégadonnées
Ingénierie de la performance au sein des mégadonnéesIngénierie de la performance au sein des mégadonnées
Ingénierie de la performance au sein des mégadonnéesDaniel Lemire
 
SIMD Compression and the Intersection of Sorted Integers
SIMD Compression and the Intersection of Sorted IntegersSIMD Compression and the Intersection of Sorted Integers
SIMD Compression and the Intersection of Sorted IntegersDaniel Lemire
 
Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorizationDecoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorizationDaniel Lemire
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Daniel Lemire
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Daniel Lemire
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexesDaniel Lemire
 
Roaring Bitmaps (January 2016)
Roaring Bitmaps (January 2016)Roaring Bitmaps (January 2016)
Roaring Bitmaps (January 2016)Daniel Lemire
 
Roaring Bitmap : June 2015 report
Roaring Bitmap : June 2015 reportRoaring Bitmap : June 2015 report
Roaring Bitmap : June 2015 reportDaniel Lemire
 
La vectorisation des algorithmes de compression
La vectorisation des algorithmes de compression La vectorisation des algorithmes de compression
La vectorisation des algorithmes de compression Daniel Lemire
 
Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorization  Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorization Daniel Lemire
 
Extracting, Transforming and Archiving Scientific Data
Extracting, Transforming and Archiving Scientific DataExtracting, Transforming and Archiving Scientific Data
Extracting, Transforming and Archiving Scientific DataDaniel Lemire
 
Innovation without permission: from Codd to NoSQL
Innovation without permission: from Codd to NoSQLInnovation without permission: from Codd to NoSQL
Innovation without permission: from Codd to NoSQLDaniel Lemire
 
Faster Column-Oriented Indexes
Faster Column-Oriented IndexesFaster Column-Oriented Indexes
Faster Column-Oriented IndexesDaniel Lemire
 
Compressing column-oriented indexes
Compressing column-oriented indexesCompressing column-oriented indexes
Compressing column-oriented indexesDaniel Lemire
 

Plus de Daniel Lemire (20)

Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarks
 
Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10
 
Parsing JSON Really Quickly: Lessons Learned
Parsing JSON Really Quickly: Lessons LearnedParsing JSON Really Quickly: Lessons Learned
Parsing JSON Really Quickly: Lessons Learned
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Ingénierie de la performance au sein des mégadonnées
Ingénierie de la performance au sein des mégadonnéesIngénierie de la performance au sein des mégadonnées
Ingénierie de la performance au sein des mégadonnées
 
SIMD Compression and the Intersection of Sorted Integers
SIMD Compression and the Intersection of Sorted IntegersSIMD Compression and the Intersection of Sorted Integers
SIMD Compression and the Intersection of Sorted Integers
 
Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorizationDecoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorization
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Roaring Bitmaps (January 2016)
Roaring Bitmaps (January 2016)Roaring Bitmaps (January 2016)
Roaring Bitmaps (January 2016)
 
Roaring Bitmap : June 2015 report
Roaring Bitmap : June 2015 reportRoaring Bitmap : June 2015 report
Roaring Bitmap : June 2015 report
 
La vectorisation des algorithmes de compression
La vectorisation des algorithmes de compression La vectorisation des algorithmes de compression
La vectorisation des algorithmes de compression
 
OLAP and more
OLAP and moreOLAP and more
OLAP and more
 
Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorization  Decoding billions of integers per second through vectorization
Decoding billions of integers per second through vectorization
 
Extracting, Transforming and Archiving Scientific Data
Extracting, Transforming and Archiving Scientific DataExtracting, Transforming and Archiving Scientific Data
Extracting, Transforming and Archiving Scientific Data
 
Innovation without permission: from Codd to NoSQL
Innovation without permission: from Codd to NoSQLInnovation without permission: from Codd to NoSQL
Innovation without permission: from Codd to NoSQL
 
Write good papers
Write good papersWrite good papers
Write good papers
 
Faster Column-Oriented Indexes
Faster Column-Oriented IndexesFaster Column-Oriented Indexes
Faster Column-Oriented Indexes
 
Compressing column-oriented indexes
Compressing column-oriented indexesCompressing column-oriented indexes
Compressing column-oriented indexes
 

Dernier

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

MaskedVByte: SIMD-accelerated VByte

  • 1. Vectorized VByte Decoding Jeff Plaisance, Nathan Kurz, Daniel Lemire
  • 3. Inverted Index ● Like index in the back of a book ● words = terms, page numbers = doc ids ● Term list is sorted ● Doc list for each term is sorted
  • 4. doc id query country impressions clicks 0 software Canada 10 1 1 blank Canada 10 0 2 sales US 5 0 3 software US 8 1 4 blank US 10 1 Standard Index
  • 5. Constructing an Inverted Index query country impression clicks doc id blank sales software Canada US 5 8 10 0 1 0 ✔ ✔ ✔ ✔ 1 ✔ ✔ ✔ ✔ 2 ✔ ✔ ✔ ✔ 3 ✔ ✔ ✔ ✔ 4 ✔ ✔ ✔ ✔
  • 6. Constructing an Inverted Index field term 0 1 2 3 4 query blank ✔ ✔ sales ✔ software ✔ ✔ country Canada ✔ ✔ US ✔ ✔ ✔ impressions 5 ✔ 8 ✔ 10 ✔ ✔ ✔ clicks 0 ✔ ✔ 1 ✔ ✔ ✔
  • 7. Inverted Index field term doc list query blank 1, 4 sales 2 software 0, 3 country Canada 0, 1 US 2, 3, 4 impressions 5 2 8 3 10 0, 1, 4 clicks 0 1, 2 1 0, 3, 4
  • 8. Inverted Indexes Allow you to: ● Quickly find all documents containing a term ● Intersect several terms to perform boolean queries
  • 9. Inverted Index Optimizations ● Compressed data structures ○ Better use of RAM and processor cache ○ Better use of memory bandwidth ○ Increased CPU usage and time ● Optimizations matter!
  • 10. Delta / VByte Encoding ● Doc id lists are sorted ● Delta between a doc id and the previous doc id is sufficient ● Deltas are usually small integers
  • 11. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714
  • 12. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714 34, 52, 161, 54, 373, 40
  • 13. Small Integer Compression ● Golomb/Rice ● VByte (or Varint) ● Binary Packing ● PForDelta
  • 14. Small Integer Compression ● Golomb/Rice ● VByte ● Bit Packing ● PForDelta
  • 16. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  • 17. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  • 18. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  • 19. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  • 20. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  • 21. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9838 ? 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
  • 22. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  • 23. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  • 24. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  • 25. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  • 26. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  • 27. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  • 28. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  • 29. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  • 30. VByte Pros: ● Compression ● Can fit more of index in RAM ● Higher information throughput per byte read from disk
  • 31. VByte Cons: ● Decodes one byte at a time ● Lots of branch mispredictions ● Not fast to decode ● Largest ints expand to 5 bytes
  • 32. Vectorized VByte Decoding Optimized decoder implemented using x86_64 intrinsics
  • 33. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001
  • 34. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte
  • 35. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte 010010100111
  • 36. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 37. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 38. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 39. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 40. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 41. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 42. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 43. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 44. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  • 45. 010010100111 Decoding options for: ● sixteen 1 byte varints ● six 1-2 byte varints ● four 1-3 byte varints ● two 1-5 byte varints
  • 46. 010010100111 Decoding options for: ● sixteen 1 byte varints - special case ● six 1-2 byte varints - 2^6, 64 possibilities ● four 1-3 byte varints - 3^4, 81 possibilities ● two 1-5 byte varints - 5^2, 25 possibilities 170 total possibilities
  • 47. 010010100111 Data Distribution: ● Longer doc id lists are necessarily composed of smaller deltas ● Most deltas in real datasets (ClueWeb09, Indeed’s internal datasets) fall into 1 byte case or 1-2 byte case
  • 48. Most Significant Bit Decoding ● We separate most significant bit decoding from integer decoding ● Reduces duplicate most significant bit decoding work if we don’t consume all 12 bytes ● Better instruction level parallelism
  • 49. 010010100111 ● If most significant bits of next 16 bytes are all 0, handle sixteen 1 byte ints case ● Otherwise lookup most significant bits of next 12 bytes in 4096 entry lookup table
  • 50. 010010100111 Lookup table contains: ● Shuffle vector index from 0-169 representing which possibility we are decoding ● Number of bytes of input that will be consumed
  • 51. 010010100111 Branch on shuffle vector index to determine which case we are decoding ● 0-63 - six 1-2 byte ints ● 64-144 - four 1-3 byte ints ● 145-169 - two 1-5 byte ints
  • 52. Six 1-2 Byte Ints 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Decode 6 varints from 9 bytes
  • 53. Expected Positions 2 1 2 1 2 1 2 1 2 1 2 1 0 0 0 0 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2 Six 1-2 byte ints Four 1-3 byte ints Two 1-5 byte ints
  • 54. Six 1-2 Byte Ints 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Pad out 1 byte ints to 2 bytes
  • 55. Six 1-2 Byte Ints 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Pad out 1 byte ints to 2 bytes
  • 56. Shuffle input ● Use index to lookup appropriate shuffle vector ● Shuffle input bytes to get them in the expected positions
  • 57. for (i = 0; i < 16; i++) { if (mask[i] & 0x80) { dest[i] = 0; } else { dest[i] = src[mask[i] & 0xF]; } } pshufb
  • 58. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  • 59. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  • 60. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  • 61. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  • 62. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  • 63. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  • 64. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  • 65. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  • 66. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  • 67. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 src mask dest
  • 68. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 src mask dest
  • 69. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  • 70. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  • 71. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  • 72. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 src mask dest
  • 73. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 src mask dest
  • 74. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  • 75. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  • 76. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  • 77. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 43 src mask dest
  • 78. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 43 7C DE 45 27 60 7C E3 43 A9 src mask dest
  • 79. Shuffle input ● Use index to lookup appropriate shuffle vector ● Shuffle input bytes to get them in the expected positions
  • 80. Six 1-2 Byte Ints 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Reverse bytes in 2 byte varints *not actually necessary since x86 is little endian
  • 81. Six 1-2 Byte Ints 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Reverse bytes in 2 byte varints *not actually necessary since x86 is little endian
  • 82. Six 1-2 Byte Ints 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Mask out leading purple 1’s
  • 83. Six 1-2 Byte Ints 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Mask out leading purple 1’s
  • 84. Six 1-2 Byte Ints 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  • 85. Six 1-2 Byte Ints 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  • 86. Six 1-2 Byte Ints 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 Done!
  • 87. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110
  • 88. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 89. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 90. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 91. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 92. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 93. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  • 94. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 Decode 4 varints from 8 bytes
  • 95. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 Pad ints to 4 bytes
  • 96. Four 1-3 Byte Ints 11101110 00011101 00000000 00000000 11110101 11101101 01111001 00000000 11111000 01101001 00000000 00000000 00100001 00000000 00000000 00000000 Pad ints to 4 bytes
  • 97. Four 1-3 Byte Ints 00000000 00000000 00011101 11101110 00000000 01111001 11101101 11110101 00000000 00000000 01101001 11111000 00000000 00000000 00000000 00100001 Reverse bytes *not actually necessary since x86 is little endian
  • 98. Four 1-3 Byte Ints 00000000 00000000 00011101 11101110 00000000 01111001 11101101 11110101 00000000 00000000 01101001 11111000 00000000 00000000 00000000 00100001 Clear top bit of each byte
  • 99. Four 1-3 Byte Ints 00000000 00000000 00011101 01101110 00000000 01111001 01101101 01110101 00000000 00000000 01101001 01111000 00000000 00000000 00000000 00100001 Clear top bit of each byte
  • 100. Four 1-3 Byte Ints 00000000 00000000 00011101 01101110 00000000 01111001 01101101 01110101 00000000 00000000 01101001 01111000 00000000 00000000 00000000 00100001 Shift 2nd least significant bytes over by 1 bit (mask/shift/or)
  • 101. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 01111001 00110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 2nd least significant bytes over by 1 bit (mask/shift/or)
  • 102. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 01111001 00110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 3rd least significant bytes over by 2 bits (mask/shift/or)
  • 103. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 00011110 01110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 3rd least significant bytes over by 2 bits (mask/shift/or)
  • 104. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 00011110 01110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Done!
  • 105. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  • 106. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  • 107. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  • 108. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  • 109. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 Decode 2 varints from 9 bytes
  • 110. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 ● Could handle the same way as other cases ● Would require 5 AND operations, 4 shift operations, and 4 OR operations
  • 111. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 ● We can simulate shifting by different amounts with multiplication ● Only needs 1 multiplication, 1 shift, 1 OR, 1 shuffle
  • 112. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2 Two 1-5 byte ints Treat SIMD register as eight 16 bit registers, loading 1 byte into each. First byte doesn’t need to be shifted.
  • 113. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  • 114. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  • 115. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 00000000 00000000 10011101
  • 116. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 11110101 00000000 10011101
  • 117. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 11101101 00000000 11110101 00000000 10011101
  • 118. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000011 00000000 11101101 00000000 11110101 00000000 10011101
  • 119. Two 1-5 Byte Ints 11101110 00000011 00000000 11101101 00000000 11110101 00000000 10011101 Clear top bit of each byte
  • 120. Two 1-5 Byte Ints 01101110 00000011 00000000 01101101 00000000 01110101 00000000 00011101 Clear top bit of each byte
  • 121. Two 1-5 Byte Ints 01101110 00000011 * 16 (<< 4) 00000000 01101101 * 32 (<< 5) 00000000 01110101 * 64 (<< 6) 00000000 00011101 * 128 (<< 7) Multiply to shift bits into place
  • 122. Two 1-5 Byte Ints 01101110 00000011 * 16 (<< 4) 00000000 01101101 * 32 (<< 5) 00000000 01110101 * 64 (<< 6) 00000000 00011101 * 128 (<< 7) Multiply to shift bits into place
  • 123. Two 1-5 Byte Ints 11100000 00110000 * 16 (<< 4) 00001101 10100000 * 32 (<< 5) 00011101 01000000 * 64 (<< 6) 00001110 10000000 * 128 (<< 7) Multiply to shift bits into place
  • 124. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000
  • 125. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Left shift everything by 8 bits
  • 126. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Left shift everything by 8 bits 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 127. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 128. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 129. Two 1-5 Byte Ints 11110000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 130. Two 1-5 Byte Ints 11110000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 131. Two 1-5 Byte Ints 11110000 00111101 10101101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 132. Two 1-5 Byte Ints 11110000 00111101 10101101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 133. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 134. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 135. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  • 136. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 137. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 138. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 139. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 140. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 141. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 142. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 143. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 144. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 145. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  • 146. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 OR in low 7 bits of least significant byte (remember that we stored it in most significant byte position originally)
  • 147. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 11101110 OR in low 7 bits of least significant byte (remember that we stored it in most significant byte position originally)
  • 148. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result!
  • 149. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 150. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 151. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 152. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 153. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 154. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  • 157. Q&A