Understanding MLOps

0
Understanding MLOps
2021.4.22
ChunMK
(chunmk80@gmail.com)

1
기계학습에 대한 오해와 현실

기계학습에 대한 오해와 현실

3
기계학습 기본 개념 이해

4
Feature(Store) 란?
• 기계 학습에서 피처는 시스템의 입력과 같은 역할을 하는 개별 독립 변수
• 예측을 하는 동안, 모델들은 예측을 하기 위해 준비된 피처들을 사용.
피처 엔지니어링 프로세스를 사용하여 기계학습의 오래된 피처에서 새
로운 피처를 얻을 수 있음. (예를 들어, 데이터 집합의 한 열을 "변수 또
는 속성"이라고도 하는 하나의 피처로 간주할 수 있으며, 더 많은 수의
피처를 치수(dimensions)라 함.
기계 학습 및 패턴 인식에서 피처는
관찰되는 현상의 개별 측정 가능한 속성 또는 특성
feature Featureengineering FeatureStore

5
• 피처는 분석하려는 개체의 측정 가능한 속성
• 데이터셋에서 피처는 다음과 같이 표시
위의 이미지는 불운한 타이타닉 처녀 항해의 승객에 대한 정보가 포함
된 공개 데이터 세트의 데이터 조각을 포함하고 있다. 각 피처 또는 열은
이름, 나이, 성별, 요금 등 분석에 사용할 수 있는 측정 가능한 데이터 조
각을 나타낸다. 피처를 "변수" 또는 "속성"이라고도 한다. 분석하려는 항
목에 따라 데이터 세트에 포함된 피처는 매우 다양할 수 있다.
Feature(Store) 란?

6
Model 이란?
Prediction Algorithm
https://medium.com/brandlitic/difference-between-ml-algorithm-and-model-801a798a6dc0
Feature

7
Drift issues
Model Drift
Types of Model Drift
There are three main types of model drift:
1. Concept drift
2. Data drift
3. Upstream data changes
개념 드리프트는 종속 변수의 속성이 변경되는 모델 드리프트 유형. 사기 모델은 '사기' 분
류가 변경되는 개념 드리프트의 예
데이터 드리프트는 독립 변수의 속성이 변경되는 모델 드리프트 유형. 데이터 드리프트의
예로는 계절성, 소비자 선호도의 변화, 신제품 추가 등으로 인한 데이터 변화
업스트림 데이터 변경은 데이터 파이프 라인의 운영 데이터 변경을 의미. 이에 대한 예는
피처가 더 이상 생성되지 않아 값이 누락되는 경우. 또 다른 예는 측정의 변화(예 : 마일에
서 킬로미터로).

기계학습 프로젝트의 수명주기를
알아야 한다?

9
Machine Learning Project Life cycle
Goal
Definition
Feature
Engineering
Model
Training
Model
Evaluation
Model
Deployment
Model
Maintainance
Model
Serving
Model
Monitoring
Data
Collection &
Preparation
Business
Problem

기계학습시스템의 과제들

11
Exploratiry
Data
Analysis
Local
Data
기존 ML 모델 개발 방식

12
Exploratiry
Data
Analysis
Local
Data
Data
Preparation
Model
Training
Model
Evaluation
Manual experiment step

13
Exploratiry
Data
Analysis
Local
Data
Data
Preparation
Model
Training
Model
Evaluation
Model
Analysis

14
Exploratiry
Data
Analysis
Local
Data
Data
Preparation
Model
Training
Model
Evaluation
Model
Analysis
Trained
Model

15
Exploratiry
Data
Analysis
Local
Data
Data
Preparation
Model
Training
Model
Evaluation
Model
Analysis
Trained
Model
Model
Deployment
Storage
Training
Data Science
Serving
IT

16
Challenges
▪ Time Consuming
▪ Manual
▪ Inflexible
▪ Error Prone
▪ Not Resuable

17
종속성 이슈
1
의사소통 이슈
2
재현성 이슈
3
투명성 및 워크플로우
재사용성 이슈
4
• 모든 사람이 동일한 언어를
사용하는 것은 아님.
• 기계학습 라이프 사이클에
비즈니스, 데이터 과학 및 IT
팀의 사람들이 포함되지만,
이러한 그룹 중 어느 그룹도
동일한 도구를 사용하지
않으며, 대부분의 경우
기본적인 의사 소통 능력만
기준으로 공유함.
• 데이터와 모델에 대한
접근제어가 효과적으로 되어
있지 않을 경우 누가 어떤
작업을 하였는지에 대한
관리가 어려움.
• 다른 구성원이 작업에 대하여
이해하거나 명확한 접근
권한을 갖고 있지 않을 경우
기계학습 결과물의 투명성에
영향을 미침.
• 데이터는 지속적으로
변경되며, 비즈니스
요구사항도 수시로 변경됨
• 결과물은 프로덕션 및
프로덕션 데이터에서 모델의
실체가 기대치와 일치하고
(중요하게) 원래 문제를
해결하거나 원래 목표를
충족하는지 확인하기 위해
지속적으로 비즈니스에 다시
전달되어야 함.
• 명확하고 재현 가능한
워크플로를 사용하지 않는
경우, 다른 부분에서 일하는
사람들이 자신도 모르게
정확히 그와 동일한 솔루션을
만들기 위해 작업하는 것은
매우 일반적임.
기계학습 시스템의 도전과제들

19
데이터는 끊임없이 변화할 뿐만 아니라 비즈니스 요구도 변화
“ML 모델의 결과(예 : 예측을 출력하는 샘플 데이터를 기반으로 하는
수학적 모델)를 지속적으로 비즈니스에 전달하여 모델의 실제가 기대
치와 일치하는지 그리고 -결정적으로- 원래 문제를 해결하거나 원래
목표를 충족하는지 확인”
데이터팀이 비즈니스 문제를 겪고 있고 팀이 6 개월 안에 해결책을 제시해야 한다고 가정
팀은 초기 프로젝트 매개 변수에 따라 데이터를 정리하고,
모델을 구축하고, 정보를 시각화 하는데 몇 달을 보냅니다.
6 개월 후 데이터팀은 자신의 작업을 비즈니스 팀에 발표하고
응답은 “좋습니다!”
안타깝게도 프로젝트 시작 이후 원본 데이터가 변경되어
고객의 행동도 마찬가지로 변화했습니다.
6 개월의 노력과 시간을 낭비한 것이며 다시 시작으로 돌아갑니다.
데이터가 수정되고 다시 조정됨에 따라 4 개월이 더 지나면
원래 프로젝트 매개 변수가 다시 변경되었음을 알 수 있습니다.

기계학습운영시스템
누가 참여 하는가?

21
ML Team members
Data
Scientist
ML
Engineer
Subject
Matter
Expert
Software
Engineer
DevOps
Engineer
Data
Engineer
Data
Analyst
▪ 핵심적인비즈니스 질문
▪ 모델 성능이 비즈니스요구/목표를충
족하는지확인
▪ 프로덕션에서ML 모델을 확장
▪ 프로덕션의ML 모델에 대한 아키텍처
를 개선하고 최적화
▪ 데이터 분석 및 탐색적 데이터 분석
▪ ML 모델에 사용할 피처 개발 지원
▪ ML 프로세스(ETL)에사용할데이터추출을최
적화하고구축
▪ SME가 제기한 비즈니스질문에 답할
수있는모델 개발
▪ 모델을테스트,프로덕션에전달,비즈
니스 가치 창출
▪ 모델 결과, 정확도 검토 및 모델 재학습
▪ ML 모델과 함께 작동하는API 또는
애플리케이션개발
▪ ML 모델이 다른 소프트웨어플랫폼
에서 올바르게작동하는지확인
▪ ML 모델을 지원하는 아키텍처의보안
및 성능 관리
▪ 모든 환경에서ML 모델에대한 CI/CD
파이프 라인을 처리
▪ 데이터과학자와기계 학습엔지니어
가 작업을 수행하는 인프라를설정
▪ 데이터 저장, 데이터 전송, 적절한 볼륨,
적절한 속도, 필요한 사용을 담당
▪ 주로 데이터 파이프 라인을전문으로
하는 소프트웨어엔지니어

22
모든 구성원이 같은 언어를
사용하는 것은 아니다.

23
은행에서 신용카드 사기 적발시스템을 구축할 때 시나리
오.
• 데이터 과학자는 이론적으로 은행에서 신용 카드 거래
사기를 감지 할 수 있는 모델을 개발합니다.
• 그런 다음 기계 학습 엔지니어는 현장에 모델을 배포
하고 매일 수십억 건의 트랜잭션을 처리 할 수 있는지
확인합니다.
• 데이터 엔지니어는 은행이 처리하는 모든 거래 데이터
가 올바르게 저장되도록 할 책임이 있습니다. 시스템
이 초당 백만 건의 트랜잭션을 처리해야하는 경우 데
이터 엔지니어는 지연이나 병목 현상없이 적기에 모든
정보를 시스템의 올바른 부분으로 전달할 수있는 데이
터 파이프 라인을 구축합니다.
기계학습 시스템(모델) 개발 사례

24
완전히 다른 기술 세트를 가지고 있고
종종 완전히 다른 도구를 사용하는
다양한 사람들이 참여하는
오늘날 평균 조직 내부의
ML 모델 라이프 사이클에 대한
현실적인 그림
What Is MLOps? by Mark Treveil and Lynn Heidmann, O’Reilly Media, 2020.11

25
ML Team Communication
Data
Scientist
ML
Engineer
Subject
Matter
Expert
Software
Engineer
DevOps
Engineer
Data
Engineer
Data
Analyst
Siloed Team Member
Not Everyone Speaks the Same
Language

26
Siloed Team, between Teams Communication
Team A TeamB Team C Team D
Team … Team.. Team…
Enterprise ML work Teams
Not Everyone Speaks the Same
Language

왜?
기계학습운영시스템인가?
(MLOps)

28
와글와글
MLOps – New Buzzword
Google Trend – MLOps vs. ML

29
와글와글
MLOps – New Buzzword
2018 년 이후 MLOps라는 개념으로 여기저기서 솔루션을 발표하며 등장.
동일한 용어로 표시되었지만 그 내용은 천차만별.
‘MLOps 성숙도 정도’ 라는 기준으로 정리중.
MLOps
MLOps 성숙도 정도

30
출처 : Algorithmia - 2021 enterprise trends in machine learning 조사 자료중 정리
AI/ML의 우선 순위와 예산이
전년 대비 크게 증가
대부분의 조직은
프로덕션에 25 개 이상의
모델을 보유, AI/ML 모델을
“보유 기업"과 "무보유 기업"
사이에는 차이 존재
조직은 특히 프로세스
자동화 및 고객 경험에
중점을두고 보다 광범위한
AI/ML 사용 사례로 확장
거버넌스는 AI/ML 배포의
가장 큰 과제이며
전체 조직의 절반 이상이
이를 우려 사항으로 평가
두 번째로 큰 AI/ML 과제는
기술통합 및 호환성이며,
조직의 49 %가 이를
우려 사항으로 평가
조직적 연계는 AI/ML
성숙도 달성에 있어
가장 큰 차이
성공적인 AI/ML
이니셔티브를 위해서는
여러 의사 결정자와
비즈니스 기능에 대한
조직적 조정이 필요
모델 배포에 필요한
시간이 증가하고 있으며
전체 조직의 64 %가
한 달 이상 소요
조직의 38 %는 데이터
과학자의 시간 중 50 %
이상을 배포에 사용하며
이는 규모에 따라
악화 되고 있음
서드파티의 ML 운영(MLOps) 솔루션을 사용하는 조직은 자체 솔루션을 구축하는 조직보다
비용을 절감하고 모델 배포에 더 적은 시간을 소비
2021년 기업 ML 10대 트랜드(글로벌)

Target Persona changing
Case of Kubeflow

Propritary
Inference
Servers
Using proprietary tools to
perform modeling and
inference
➢ SAS
➢ SPSS
➢ FICO
The Rise of
Open Source
Data Science Tools
… , attempt to wrap the data
science stack in a lightweight web
service framework, and put it into
production.
Phython:
➢ SciPy stack
➢ Scitkit-learn
➢ TensorFlow etc.
R:
➢ dplyr
➢ ggplot2
➢ Etc.
➢ Spark, H2O, others…
Containerization
to-the rescue
containerized the “Stone
Age” approach, making it
easy to scale, robust, etc.
MLOps
Platform
➢ Dockerized open source
ML stacks
➢ Deployed them on premise
or in the cloud via
Kubernetes
➢ And providing some
manageability(MLOps)
time
Pre-History Age Stone Age Bronze Age
MLOps
Gold Rush Age
2000 2015 2018
https://github.com/adbreind/open-standard-models-2019/blob/master/01-
Intro.ipynb
Adam Breindel
Evolution of MLOps
→ 계속 진화중

33
MLOps Principles – ‘Continuous X’
MLOps is an ML engineering culture that includes the following practices:
• Continuous Integration (CI) extends the testing and validating code and components
by adding testing and validating data and models.
• Continuous Delivery (CD) concerns with delivery of an ML training pipeline that
automatically deploys another the ML model prediction service.
• Continuous Training (CT) is unique to ML systems property, which automatically
retrains ML models for re-deployment.
• Continuous Monitoring (CM) concerns with monitoring production data and model
performance metrics, which are bound to business metrics.

34
Google’s MLOps Guidelines (2020.11 → 2021.4 …)
MLOps maturity model
Level Description
0 No Ops
1 DevOps but no MLOps
2 Automated Training
3 Automated Model Deployment
4 Automated Operations (full MLOps)
*Azure Machine Learning : MLOps Maturity Model
Maturity of MLOps
▪ MLOps 레벨 0 : 수동 프로세스 : 모델 학습 및 배포의 필요성은 공식적으로 인식되지만 스크립트 및
대화형 프로세스를 통해 임시 방식으로 수동으로 수행되는 경우가 많음. 이 수준에는
일반적으로 지속적인 통합과 지속적인 배포가 없음
▪ MLOps 레벨 1 : ML 파이프라인 자동화 :이 레벨은 지속적인 학습을 위한 파이프라인을 도입. 데이터
및 모델 유효성 검사가 자동화되며 성능이 저하 될 때 새로운 데이터로 모델을 재학
습하는 트리거가 있음.
▪ MLOps 레벨 2 : CI/CD 파이프라인 자동화 : ML 워크플로우는 데이터 과학자가 개발자의 개입을 줄
이면서 모델과 파이프 라인을 모두 업데이트 할 수 있는 지점까지 자동화되어 있음.

35
Level Description Highlights Technology
0 No MLOps
• 전체 기계 학습 모델 수명주기를 관리하기 어려움
• 팀이 서로 다르고 릴리스가 고통 스러움
• 대부분의 시스템은 "블랙 박스"로 존재하며 배포중 /
사후 피드백이 거의 없음
• 수동 빌드 및 배포
• 모델 및 애플리케이션의 수동 테스트
• 중앙 집중식 모델 성능 추적 없음
• 모델 학습은 수동
1
DevOps but no
MLOps
• 릴리스는 No MLOps보다 덜 고통 스럽지만 모든 새
모델에 대해 데이터 팀에 의존
• 모델이 프로덕션에서 얼마나 잘 수행되는지에 대한
피드백은 여전히 제한적
• 결과를 추적/재현하기 어려움
• 자동화된 빌드
• 애플리케이션 코드에 대한 자동화된 테스트
2
Automated
Training
• 학습 환경은 완전히 관리되고 추적 가능
• 모델 재현 용이
• 릴리스는 수동이지만 마찰이 적음
• 자동화된 모델 학습
• 모델 훈련 성능의 중앙집중식 추적
• 모델 관리
3
Automated
Model
Deployment
• 릴리스는 마찰이 적고 자동
• 배포에서 원본 데이터까지 완전한 추적성
• 전체 환경 관리: 학습> 테스트> 프로덕션
• 배포를 위한 모델 성능의 통합 A/B 테스트
• 모든 코드에 대한 자동화된 테스트
• 모델 훈련 성과의 중앙집중식 훈련
4
Full MLOps
Automated
Operations
• 전체 시스템이 자동화되고 쉽게 모니터링 됨
• 프로덕션 시스템은 개선 방법에 대한 정보를
제공하며 경우에 따라 새 모델로 자동 개선
• 제로 다운 타임 시스템에 접근
• 자동화된 모델 학습 및 테스트
• 배포된 모델의 자세한 중앙집중식 메트릭
Azure Machine Learning : MLOps Maturity Model
Maturity of MLOps

기계학습운영시스템(ML
Operations, MLOps)은
어떻게 구성되어 있는가?

37
increase automation and improve the quality of production ML
MLOps looks to
• Machine Learning
• DevOps (IT)
• Data Engineering
Components of MLOps
MLOpsis
defined as “a practice for collaboration and communication
between data scientists and operations professionals to help manage
production ML (or deep learning) lifecycle.
bmc.com/blogs/mlops-machine-learning-ops
An ML engineering culture and practice that aims at unifying ML
system development(Dev) and ML system operation (Ops) - Google

38
• 강력한 기계 학습 수명주기 관리를 통한 신속한 혁신
• 재현 가능한 워크플로우 및 모델 생성
• 모든 위치에 고정밀 모델을 쉽게 배치
• 전체 기계학습 수명주기의 효과적인 관리
• 기계학습 자원 관리 시스템 및 제어
Benefits of MLOps
https://www.bmc.com/blogs/mlops-machine-learning-ops/

39
The difficulties with MLOps
• 배포 및 자동화
• 모델 및 예측의 재현성
• 진단
• 거버넌스 및 규정 준수
• 확장성
• 협업
• 비즈니스 용도
• 모니터링 및 관리

40
MLOps Stage Output of the Stage Execution
개발 및 실험(ML 알고리즘, 새로운 ML 모델)
파이프라인용 소스 코드 : 데이터 추출, 유효성 검사,
준비, 모델 학습, 모델 평가, 모델 테스트
파이프라인 지속적 통합(소스 코드 빌드 및 테스트 실행) 배포할 파이프라인 구성 요소 : 패키지 및 실행 파일
파이프라인 지속적 배포(대상 환경에 파이프라인 배포) 모델의 새로운 구현으로 배포된 파이프라인
자동화 트리거링(파이프라인은 프로덕션에서 자동으로
실행. 일정 또는 트리거가 사용됨)
모델 레지스트리에 저장되는 학습된 모델
모델 지속적 배포(예측을 위한 모델 제공) 배포된 모델 예측 서비스(예 : REST API로 노출된 모델)
모니터링(실시간 데이터에 대한 모델 성능 데이터 수집)
파이프라인을 실행하거나 새 실험주기를 시작하려면
트리거 진행
MLOps stages that reflect the process of
ML pipeline automation Setup Components

41
MLOps Setup
Components
Description
Source Control 코드, 데이터 및 ML 모델 아티팩트의 버전 관리
Test & Build Services
(1) 모든 ML 아티팩트에 대한 품질 보증 및 (2) 파이프 라인 용 패키지 및
실행 파일 빌드를 위해 CI 도구 사용
Deployment Services 대상 환경에 파이프 라인을 배포하기 위해 CD 도구 사용
Model Registry 이미 훈련된 ML 모델을 저장하기 위한 레지스트리
Feature Store
입력 데이터를 모델 학습 파이프 라인 및 모델 제공 중에 사용할 피처로
사전 처리
ML Metadata Store
모델명, 매개 변수, 학습 데이터, 테스트 데이터 및 메트릭 결과와 같은
모델 학습의 메타 데이터를 추적
ML Pipeline
Orchestrator
ML 실험 단계 자동화
MLOps Setup Components
ml-ops.org/content/mlops-principles
MLOps Principles

42
Design
Model
Development
Operations
Requirements Engineering
ML Use Case Priorization
Data Availability Check
Data Engineering
ML Model Engineering
Model Testing & Validation
ML Model Deployment
CI/CD Pipelines
Monitoring & Triggering
ml-ops.org/content/mlops-principles
MLOps Principles
Iterative-Incremental Process in MLOps

43
Experimentation &
Development
Training Pipeline
Continus Training
Model Serving
Pipeline
Continus
Evaluation
Traceability &
Explainability
Code
Repository
Artifact
Repository
Trained Models &
ML metadata
Deployed
models
Model
Logs
https://medium.com/technoesis/mlops-is-a-practice-not-a-tool-41674c5bdad7
MLOps is a Practice, Not a Tool
Continuous feedback loops
with an MLOps workflow

44
ML Engineering & Operations
Product
manager
Subject matter
Expert
Business
Objective

45
Data
Acquisition
Exploratory
Data
Analysis
Product
manager
Subject matter
Expert
Business
Objective
Data
Engineer

46
Data
Acquisition
Exploratory
Data
Analysis
Data
preparation &
Processing
Feature
Engineering
Model
Trainning/
Experimentation
Model
Analysis &
evaluation
Product
manager
Subject matter
Expert
Business
Objective
Model
Developme
nt
Data
Scientist
Data
Engineer

47
Data
Acquisition
Exploratory
Data
Analysis
Data
preparation &
Processing
Feature
Engineering
Model
Trainning/
Experimentation
Model
Analysis &
evaluation
Runtime
Enviornment
Risk Assessment
Final Model
performance
analysis
Product
manager
Subject matter
Expert
Business
Objective
Model
Developme
nt
Data
Scientist
Data
Engineer
ML
Architect
+
Data
Engineer

48
Data
Acquisition
Exploratory
Data
Analysis
Data
preparation &
Processing
Feature
Engineering
Model
Trainning/
Experimentation
Model
Analysis &
evaluation
Runtime
Enviornment
Risk Assessment
Final Model
performance
analysis
Autoscaling
Containerization
(Docker/Kubernetes)
CI/CD Pipeline
Product
manager
Subject matter
Expert
Business
Objective
Model
Developme
nt
Data
Scientist
Data
Engineer
Data Engineer
+
DevOps
ML
Architect
+
Data
Engineer

49
Data
Acquisition
Exploratory
Data
Analysis
Data
preparation &
Processing
Feature
Engineering
Model
Trainning/
Experimentation
Model
Analysis &
evaluation
Runtime
Enviornment
Risk Assessment
Final Model
performance
analysis
Autoscaling
Containerization
(Docker/Kubernetes)
CI/CD Pipeline
Logging/
Scheduling
Online Monitoring
Performance
degradation
checker
Product
manager
Subject matter
Expert
Business
Objective
Model
Developme
nt
Data
Scientist
DevOps
+
Data Scientist
Data
Engineer
Data Engineer
+
DevOps
ML
Architect
+
Data
Engineer

다시 구글 MLOps 성숙도로
돌아가면,

51
Experimentation/
Development
Continuous
Training
Model CI/CD
Continuous
Monitoring
Training Serving
ML Solution Lifecycle

52
Orchestrated Experiment
Data
Validation
Data
Preparation
Model
Training
Source
Repository
Model
Evaluation
Model
Validation
Development
Datasets
Data
Extraction
Source
Code
Automated E2E Pipeline
Reliable & Repeatible Training

53
Data
Validation
Data
Preparation
Model
Training
Source
Repository
Model
Evaluation
Model
Validation
Development
Datasets
Data
Extraction
Source
Code
Training Pipeline CI/CD
Run Automated
tests
Tag and store
Artifacts
Deploy to target
enviornment
Artifact
Store
Build Components
& Pipeline
Ml Pipeline
Artifacts

54
Data
Validation
Data
Preparation
Model
Training
Source
Repository
Model
Evaluation
Model
Validation
Development
Datasets
Data
Extraction
Source
Code
Training Pipeline CI/CD
Run Automated
tests
Tag and store
Artifacts
Deploy to target
enviornment
Artifact
Store
Build Components
& Pipeline
Ml Pipeline
Artifacts
Continuous Training
Data
Validation
Data
Preparation
Model
Training
Model
Registry
Model
Evaluation
Model
Validation
Training
Datasets
Data
Extraction
Trained
Models

55
Model Deployment CI/CD
Run Automated Tests
Source
Repository
Deploy to Target
Enviornment
Build Prediction Service
Model
Registry
Reliable & Monitored Serving

56
Model Deployment CI/CD
Run Automated Tests
Source
Repository
Deploy to Target
Enviornment
Build Prediction Service
Model
Registry
Serving Infrastructure
Explain Monitor
Predict
Live Data Evaluate
Log Store
Performance &
Event Logs
ML Metadata
Evaluations,
Data Drift and
Concept Drift
notification
Reliable & Monitored Serving

57
Experimentati
on/
Development
Training
Pipeline CI/CD
Continus
Training
Model
Deployment
CI/CD
Serving &
Monitoring
Code
Repository
Artifact
Repository
Model
Repository
Logs
Serving
Infrastructure
ML Metadata
Code and
Configuration
Pipeline
artifacts
Trained
Model
Model
Deployment
Serving
Logs
E2E View
Putting it all together

58
MLOps Level 0 : manual process
Data
Extraction and
Analysis
Data
Preparation
Model
Training
Model
Evaluation and
Validation
Model
Serving
Offilne
Data
Prediction
Service
Trained
Model
Model
Registry
ML Ops
Experimentation/ Development/ Test
Stagging/ Preproduction/ Production
MLOps 수준 0은 사용 사례에 ML을 적용하기 시작하는 많은 비즈니스에서 일반적임. 모델이 거의
변경되지 않거나 학습되지 않는 경우에는 이 수동적인 데이터 과학자 기반 프로세스로도 충분할 수
있으나, 실제로는 실제 환경에 모델이 배포될 때 손상되는 경우가 많이 있음. 모델은 환경의 동적인
변화 또는 환경이 설명된 데이터의 변화에 적응하지 못함.

59
Automated Pipeline
Data
Analysis
Data
Validation
Data
Preparation
Model
Training
Pipeline
deployment
Source
Repository
Prediction
Service
Model
Registry
Model
Evaluation
Model
Validation
Model
Analysis
Feature
Store
Data
extraction
Data
Validation
Data
Preparation
Model
Training
Model
Evaluation
Model
Validation CD: Model
Serving
ML Metadata Store
Trigger
Performance
monitoring
Source
Code
ML Ops
Trained
Model
MLOps Level 1 : ML pipeline automation
파이프라인의 새 구현이 자주 배포되지 않고 몇 개의 파이프라인만 관리한다고 가정. 이 경우 일반적으로 파이프라인과
구성요소를 수동으로 테스트. 또한 새 파이프라인 구현을 수동으로 배포하며, 파이프라인을 대상 환경에 배포하기 위해
파이프라인의 테스트된 소스 코드를 IT팀에 제출. 이 설정은 새 ML 아이디어가 아닌 새 데이터 기반의 새 모델을 배포
할 때 적합.

60
Automated Pipeline
Data
Analysis
Source
Repository
Prediction
Service
Model
Registry
Model
Analysis
Feature
Store
Data
extraction
Data
Validation
Data
Preparation
Model
Training
Model
Evaluation
Model
Validation
CD: Model
Serving
ML Metadata Store
Trigger
Performance
monitoring
Source
Code
Trained
Model
CI : Build, Test & Package
Pipeline Components
CD : Pipeline
Deployment
Package
MLOps
MLOps Level 2 : CI/CD pipeline automation
프로덕션 환경에서 ML을 구현한다고 해서 모델이 예측용 API로 배포되는 것은 아님. 대신 새 모델의 재학습 및 배포를 자
동화할 수 있는 ML 파이프라인 배포를 의미. CI/CD 시스템을 설정하면 새로운 파이프라인 구현을 자동으로 테스트하고 배
포할 수 있으며, 이 시스템을 사용하면 데이터 및 비즈니스 환경의 빠른 변화에 대처할 수 있음.

61
 

 

① 개발 및 실험: 새 ML 알고리즘과 실험 단계가
조정되는 새 모델링을 반복적으로 시도. 이 단
계의 출력은 ML 파이프라인 단계의 소스 코드
이며, 소스 코드는 소스 저장소로 푸시.
② 파이프라인 지속적 통합: 소스 코드를 빌드하
고 다양한 테스트를 실행. 이 단계의 출력은 이
후 단계에서 배포될 파이프라인 구성요소(패키
지, 실행 파일, 아티팩트).
③ 파이프라인 지속적 배포: CI 단계에서 생성된
아티팩트를 대상 환경에 배포. 이 단계의 출력
은 모델의 새 구현이 포함되는, 배포된 파이프
라인.
④ 자동화된 트리거: 파이프라인은 일정 또는 트
리거에 대한 응답에 따라 프로덕션 단계에서
자동으로 실행. 이 단계의 출력은 모델 레지스
트리로 푸시되는 학습된 모델.
⑤ 모델 지속적 배포: 학습된 모델을 예측을 위한
예측 서비스로 제공. 이 단계의 출력은 배포된
모델 예측 서비스.
⑥ 모니터링: 실시간 데이터를 기반으로 모델 성
능의 통계를 수집. 이 단계의 출력은 파이프라
인을 실행하거나 새 실험 주기를 실행하는 트
리거.
MLOps Level 2 : CI/CD pipeline automation

ML Model
Operationalization
Management 이해

63
ML Model Operationalization Management 이해
Core Components of ML Model Operationalization Management Solutions
Cognilytica Research, ML Model Management & Operations (“MLOps”) 2020- Managing the Machine Learning Model Lifecycle, February 28, 2020
• 출력 모델, 학습 데이터 세트, 테스트 및 유효성 검사 데이터 세트, 유효성 검사 출력, 하이
퍼 파라미터 설정, 앙상블 모델 및 기타 주요 아티팩트를 포함한 모델 개발 프로세스 및 아
티팩트 처리
• 재학습 파이프 라인 관리
• 단일 장치, 온 프레미스, 에지, 서버, 클라우드 및 배치, 스트림, 실시간 또는 온 디맨드 사용
에 대한 기타 운영 요구 사항을 포함한 다양한 엔드 포인트에 대한 모델 배포 및 모델 확장
요구 사항 처리
• 모든 모델 자산의 버전 관리.
• 앙상블, 하이퍼 파라미터 구성 및 설정 관리
MLOps 모델 라이프 사이클 관리에 필요한 기능

64
광범위한 환경에서 모델 운영을 위한 다양한 요구 사항 제공
• 특정 모델 반복에 대한 라이프 사이클를 처리하는 것 외에도 MLOps 솔루션은 여러 운영
엔드 포인트에서 모델의 빈번한 반복 및 버전 관리를 처리하여야 함
• 모델 자체가 반복되고 버전이 지정 될뿐만 아니라 학습 데이터 세트, 하이퍼 파라미터 설정
및 출력 모델 버전을 포함하여 모델 개발의 다른 많은 아티팩트도 마찬가지로 관리
• 이들 각각은 MLOps 시스템에 의해 처리되고 모델 소비자에게 적절하게 전달되어야함

65
• 모델 지연 시간, 성능 시간, 요청의 수량, 예측 오류 및 성능, 정확도, 재현율, F1 및 다양한 기타
측면의 측정.
• 모델로 전송되는 데이터에 대한 가시성, 다양한 효과 측정, 실패한 로그 및 감사 데이터
• 향후 버전 학습에 유용한 모델이 사용되는 방식에 대한 가시성
• 시간이 지남에 따라 감소하는 모델 성능을 측정하는 "모델 드리프트" 측정과 시간이 지남
에 따라 성능에 영향을 미치는 데이터의 변화를 측정하는 "데이터 드리프트"
• 보다 효과적인 MLOps 솔루션은 또한 기간 간의 변화를 측정하고 슬라이스, 사용자 코호트,
운영 환경 및 기타 세그먼트에 대한 메트릭을 모니터링
효과적인 MLOps 솔루션에는 다음 모델 모니터링 기능을 포함

66
효과적인 모델 거버넌스를 갖춘 MLOps 시스템은 다음을 제공
• 모델 액세스 제어, 권한 부여 및 보안
• 모델 학습, 테스트 및 배포에 대한 문서를 포함한 모델 출처 및 감사
• 사용된 학습, 테스트 및 검증 세트 기록
• 사용된 데이터 측정과 함께 시간에 따른 정확도 측정 로깅
• 버전 내역 및 모델 버전 사용
• 감사 추적을 지원하기 위한 메타 데이터 및 아티팩트 기록
• 모델 운영을 승인한 사용자와 모델개발 및 학습에 관련된 사용자 기록
• 운영, 모델 개발, 데이터 과학, LOB (Line of Business), 감사 및 규정 준수, 데이터 엔지니어
링 및 기타 역할과 같은 특정 사용자 역할에 따라 다른 사용자 정의 가능한 데이터 보기
• 모델 편향 측정 모니터링

67
MLOps 솔루션의 모델 검색 기능
• 선별된 사용 가능한 모델 목록
• 적절한 모델 선택을 용이하게 하는 다양한 투명성 측정과 함께 모델에 대한 설명
• 모델 버전에 대한 가시성
• 모델 사용을 위한 접근 제어 및 비용 메커니즘
• 전이 학습 및 모델 확장 가능성
• 카테고리, 사용자 액세스 수준 및 기타 요소별로 목록을 분류하는 기능

68
효과적인 MLOps 솔루션이 제공하는 모델 보안 요소
• 액세스 제어 메커니즘
• 모델 사용 및 액세스 감사
• 모델, 훈련 데이터 및 설정 보호
• 취약성 분석
• 성능에 영향을 미치는 데이터 또는 작업의 중요한 변경 사항에 대한 보고
• 데이터 입력 삭제(Sanitization)
• 데이터 익명화를 통한 데이터 프라이버시 강화
• API 및 액세스 모니터링

69
Machine Learning Model Development
Machine Learning Model Operationalization Management(MLOps)
Model
Dev.
Data
Prep
Model
Training
Model
Evaluation
Model
Packaging
Model
Discovery
Model
Security
Model
Monitoring
Model
Transparency
Model
Governance
Model
Versioning
ML DEV
ML OPS
Components of ML Development and Ops

70
기계학습 모델 운영 관리 솔루션의 핵심 구성요소 - 기능들
Model Lifecycle Management
Model development processes & artifacts
Output
models
Training
Data Sets
Test Data
Sets
Validation
Data Sets
Hyperpara-
meters
settings
Validation
Outputs
Ensemble
models
Other key
artifacts
Re-training
pipelines
Model
deployment
Version
control (all
model assets)
configuration,
settings management
ensemble hyperpara
meter
Model Versioning
& Iteration
Model Monitoring - Dashboard
Measure management Visibility
Drift
measurement Metrics
Model Governance
Auditability
Model provenance & auditing
Documents
Training
Testing
Deployment
Audit trail
Recording
artifacts
Metadata
Logging
Training, test,
validation sets used
Accuracy
measurements
Version
history &
model
version
usage
Model bias
measure
monitoring Customizable views of data
Operat-
ions
Data
science
Model
development
LOB
Auditing
Compli-
ance
Data
enginee-
ring
Other
roles
Model Discovery(catalogs/registries/marketplaces)
Curated
Listings of
available
models
Narrative
descriptions
of models
Access control
& cost
mechanisms
for model
usage
Visibility
into model
versions
Ability to
segment lists
Potential for
transfer
learning &
model
extension
Model Security
Access control
mechanisms
Auditing of
model use
& access
Protection of
models, training
data, settings
Vulnerability
analyses
Reporting on
significant
changes to
data
Sanitization
of data
inputs
Enforcing
data
privacy
API &
access
monitoring
Model
Model development
Model
latency
Performan
ce time
Quantity
of requests
Prediction
errors
Accuracy
Performance
measure F1 Other
artifacts
Data(sent
to model)
How
model
used
Model
Drift
Data
Drift
across
slices
user
cohorts
Operational
enviornments
Other
segments
Training data sets
Hyperparameter
settings
Output models
Control
Model
Access
authorization
security
category
User access level
Other factors

71
감사합니다.
Chun MK(chunmk80@gmail.com)

Understanding MLOps

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Understanding MLOps

Similaire à Understanding MLOps (20)

Understanding MLOps