SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Benoit Rostykus
Machine Learning Researcher Oct. 10, 2017 - ML Platform Meetup
Scope
1,888k 252k 2,322k 110k 6k
Lines of code*: *: git ls-files | xargs cat | wc -l
● 0.05 dev (I spend 5% of my time on it)
● offers a minimal DAG with backprop for feed-forward nets
● sparse data as first class citizen
● arbitrary loss function
● extremely fast on CPU
○ 0 memory allocation
○ lock-free inter-core parallelism
○ LLVM intrinsics for dense ops SIMD vectorization
Performance
● Currently in A/B test, one of the many sub-algorithms used to construct Netflix
homepage recommendations
● Training set
○ 33M rows / ~510 nonzeros per row / total dimensionality 7.3k / sparsity = 7%
○ 8 bytes per entry =(index, value)=(uint, float)
○ 16.8B entries, 125GB total
4.2 sec per SGD pass (proximal AdaGrad) over 16 cores (r4.8xlarge ec2 instance)
1.9GB
491k rows
250M entries
/ sec / core
!
33 GB / sec
1 GFLOPS / core
75% mem bandwidth
DDR4 SDRAM r4.8xlarge max read throughput is ~44GB/s
Real-world job: sparse logistic regression with positivity constraint on weights
Trade-offs
“All non-trivial abstractions, to some degree, are leaky.” - Joel Spolsky
genericity performance
● tensorflow/core/kernels
adjust_hue_op.cc
sparse_xent_op.cc
word2vec_ops.cc
REGISTER_OP("Skipgram")
.Deprecated(19,
"Moving word2vec into
tensorflow_models/tutorials and "
"deprecating its ops here as a result")
● RNN unrolling
Design choice: D
● Fact 1: python is awesome but slow. Fact 2: scientists can’t code in C++.
○ Mainstream solution: python to frontend an efficient C++ backend
○ Problem: scientists have outsourced technological leverage to C++ coders
○ Scientists might think they need a cluster of GPUs instead of a single box
○ Creates a “division of labor” which hampers innovation at interface
● vectorflow is written in D: a modern systems language
○ python-like experience for beginners, 100x faster runtime
○ C++ done right for experienced users
○ code compile run debug loop almost as fast as python
○ statically typed with great type-inference, best-in-class templates
○ amazing LLVM compiler LDC
○ low-level control if needed
■ compile-time evaluation, inline asm
■ manual mem management
● Single language benefits
○ you don’t have to switch language to have efficient code
○ less abstractions, less impedance mismatch, less bugs
○ faster dev time
D C++
Design choice: optimize for latency
● Most DL libraries optimize for throughput, not latency - assume memory move is cheap
○ mini-batch API
○ pass-by-copy by default, gather when sparse
■ computation is assumed to outweigh memory transport cost
● RAM -> GPU memory -> computation -> RAM
■ makes sense for compute heavy, dense problems
● images: convolutions are expensive
● Instead, vectorflow optimizes for low latency - assumes memory move is expensive
○ row-based API : fast query time
○ everything is pre-allocated when the graph is built
○ no memory allocation/copy during forward-prop nor backward-prop (RAM is slow)
○ great for low latency problems / sparse or shallow nets: real-time bidding, trading etc.
...
shallow => IO bound => CPU
deep => compute bound => GPU
...optimized for:
optimized for:
Design choice: templates leverage
● Data
○ Format agnostic: “bring your own data”
○ Move the code to the data, not the opposite
○ Loose requirement on schema
○ Library just expects an iterator
■ in-memory or out-of-core learning possible
○ Compile-time mapping of data fields to DAG roots to
avoid runtime copy
○ Netflix internal data-adapter example:
stream parquet-encoded s3-backed Hive tables
● Loss callback
○ Easily implement arbitrary loss functions
○ Compile-time specialization of learning logic based on
callback signature
○ Gradient buffer reference to avoid allocation
■ Can be dense or sparse!
example: sparse auto-encoder
(sparse cross-entropy)
Design choice: parallelism
● Distributed learning...
○ … is hard to implement & debug
○ … trades convergence speed for lower communication cost
■ meta-algorithms such as CoCoA (Berkeley), AIDE (CMU) help
● Don’t distribute over multiple machines unless you need it
● Inter-core parallelism: SIMD for all dense ops
● Intra-core parallelism: Hogwild! - asynchronous SGD
○ Data parallelism: each core iterates over a data chunk
○ Lock-free strategy, pretends each core is alone - race conditions will happen
○ Avoid need of a meta-algorithm
○ Works great as long as read/write patterns are sparse enough
■ More likely to be true in the sparse bottom layer
○ Works surprisingly well on dense problems too
○ Free: only cost is CPU cache line trashing
small > big simple > complex
● Distributed as source-code, not pre-compiled library
○ Compiling arch = running arch always optimized
■ leverages LLVM as much as possible, no handwritten-SIMD
● No third party dependencies
○ No brainer to install, just need a D compiler
○ Works everywhere
● Small code base, easy to understand and hack
● Polar bear friendly
Some Netflix use-cases:
● Survival regression
● Quantile regression
● Binary/multiclass classification
● Causal inference
● Auto-encoder
● ...
Roadmap:
● more complex nodes and deeper sparsity support
● algebraic API (mix of pytorch / tf through operators overloading)
● RNN, more optimizers (SVRG etc.)
● keep it simple & small - not meant to be an ML kitchen sink
demo
Thank you!
links:

Contenu connexe

Dernier

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 

Dernier (20)

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Netflix VectorFlow at ML Platform Meetup Oct 2017

  • 1. Benoit Rostykus Machine Learning Researcher Oct. 10, 2017 - ML Platform Meetup
  • 2. Scope 1,888k 252k 2,322k 110k 6k Lines of code*: *: git ls-files | xargs cat | wc -l ● 0.05 dev (I spend 5% of my time on it) ● offers a minimal DAG with backprop for feed-forward nets ● sparse data as first class citizen ● arbitrary loss function ● extremely fast on CPU ○ 0 memory allocation ○ lock-free inter-core parallelism ○ LLVM intrinsics for dense ops SIMD vectorization
  • 3. Performance ● Currently in A/B test, one of the many sub-algorithms used to construct Netflix homepage recommendations ● Training set ○ 33M rows / ~510 nonzeros per row / total dimensionality 7.3k / sparsity = 7% ○ 8 bytes per entry =(index, value)=(uint, float) ○ 16.8B entries, 125GB total 4.2 sec per SGD pass (proximal AdaGrad) over 16 cores (r4.8xlarge ec2 instance) 1.9GB 491k rows 250M entries / sec / core ! 33 GB / sec 1 GFLOPS / core 75% mem bandwidth DDR4 SDRAM r4.8xlarge max read throughput is ~44GB/s Real-world job: sparse logistic regression with positivity constraint on weights
  • 4. Trade-offs “All non-trivial abstractions, to some degree, are leaky.” - Joel Spolsky genericity performance ● tensorflow/core/kernels adjust_hue_op.cc sparse_xent_op.cc word2vec_ops.cc REGISTER_OP("Skipgram") .Deprecated(19, "Moving word2vec into tensorflow_models/tutorials and " "deprecating its ops here as a result") ● RNN unrolling
  • 5. Design choice: D ● Fact 1: python is awesome but slow. Fact 2: scientists can’t code in C++. ○ Mainstream solution: python to frontend an efficient C++ backend ○ Problem: scientists have outsourced technological leverage to C++ coders ○ Scientists might think they need a cluster of GPUs instead of a single box ○ Creates a “division of labor” which hampers innovation at interface ● vectorflow is written in D: a modern systems language ○ python-like experience for beginners, 100x faster runtime ○ C++ done right for experienced users ○ code compile run debug loop almost as fast as python ○ statically typed with great type-inference, best-in-class templates ○ amazing LLVM compiler LDC ○ low-level control if needed ■ compile-time evaluation, inline asm ■ manual mem management ● Single language benefits ○ you don’t have to switch language to have efficient code ○ less abstractions, less impedance mismatch, less bugs ○ faster dev time D C++
  • 6. Design choice: optimize for latency ● Most DL libraries optimize for throughput, not latency - assume memory move is cheap ○ mini-batch API ○ pass-by-copy by default, gather when sparse ■ computation is assumed to outweigh memory transport cost ● RAM -> GPU memory -> computation -> RAM ■ makes sense for compute heavy, dense problems ● images: convolutions are expensive ● Instead, vectorflow optimizes for low latency - assumes memory move is expensive ○ row-based API : fast query time ○ everything is pre-allocated when the graph is built ○ no memory allocation/copy during forward-prop nor backward-prop (RAM is slow) ○ great for low latency problems / sparse or shallow nets: real-time bidding, trading etc. ... shallow => IO bound => CPU deep => compute bound => GPU ...optimized for: optimized for:
  • 7. Design choice: templates leverage ● Data ○ Format agnostic: “bring your own data” ○ Move the code to the data, not the opposite ○ Loose requirement on schema ○ Library just expects an iterator ■ in-memory or out-of-core learning possible ○ Compile-time mapping of data fields to DAG roots to avoid runtime copy ○ Netflix internal data-adapter example: stream parquet-encoded s3-backed Hive tables ● Loss callback ○ Easily implement arbitrary loss functions ○ Compile-time specialization of learning logic based on callback signature ○ Gradient buffer reference to avoid allocation ■ Can be dense or sparse! example: sparse auto-encoder (sparse cross-entropy)
  • 8. Design choice: parallelism ● Distributed learning... ○ … is hard to implement & debug ○ … trades convergence speed for lower communication cost ■ meta-algorithms such as CoCoA (Berkeley), AIDE (CMU) help ● Don’t distribute over multiple machines unless you need it ● Inter-core parallelism: SIMD for all dense ops ● Intra-core parallelism: Hogwild! - asynchronous SGD ○ Data parallelism: each core iterates over a data chunk ○ Lock-free strategy, pretends each core is alone - race conditions will happen ○ Avoid need of a meta-algorithm ○ Works great as long as read/write patterns are sparse enough ■ More likely to be true in the sparse bottom layer ○ Works surprisingly well on dense problems too ○ Free: only cost is CPU cache line trashing
  • 9. small > big simple > complex ● Distributed as source-code, not pre-compiled library ○ Compiling arch = running arch always optimized ■ leverages LLVM as much as possible, no handwritten-SIMD ● No third party dependencies ○ No brainer to install, just need a D compiler ○ Works everywhere ● Small code base, easy to understand and hack ● Polar bear friendly Some Netflix use-cases: ● Survival regression ● Quantile regression ● Binary/multiclass classification ● Causal inference ● Auto-encoder ● ... Roadmap: ● more complex nodes and deeper sparsity support ● algebraic API (mix of pytorch / tf through operators overloading) ● RNN, more optimizers (SVRG etc.) ● keep it simple & small - not meant to be an ML kitchen sink
  • 10. demo