'인공지능은 의료를 어떻게 혁신하는가' 주제의 2017년 11월 버전입니다.
'How Artificial Intelligence would Innovate the medicine of the future'
최윤섭 소장 (최윤섭 디지털 헬스케어 연구소)
Yoon Sup Choi, PhD (Director/Founder, Digital Healthcare Institute)
yoonsup.choi@gmail.com
1. How Artificial Intelligence would Innovate the Medicine in the Future.
Professor, SAHIST, Sungkyunkwan University
Director, Digital Healthcare Institute
Yoon Sup Choi, Ph.D.
인공지능은 의료를 어떻게 혁신할 것인가
2. “It's in Apple's DNA that technology alone is not enough.
It's technology married with liberal arts.”
16. •AP 통신: 로봇이 인간 대신 기사를 작성
•초당 2,000 개의 기사 작성 가능
•기존에 300개 기업의 실적 ➞ 3,000 개 기업을 커버
17. • 1978
• As part of the obscure task of “discovery” —
providing documents relevant to a lawsuit — the
studios examined six million documents at a
cost of more than $2.2 million, much of it to pay
for a platoon of lawyers and paralegals who
worked for months at high hourly rates.
• 2011
• Now, thanks to advances in artificial intelligence,
“e-discovery” software can analyze documents
in a fraction of the time for a fraction of the
cost.
• In January, for example, Blackstone Discovery of
Palo Alto, Calif., helped analyze 1.5 million
documents for less than $100,000.
18. “At its height back in 2000, the U.S. cash equities trading desk at
Goldman Sachs’s New York headquarters employed 600 traders,
buying and selling stock on the orders of the investment bank’s
large clients. Today there are just two equity traders left”
19. •일본의 Fukoku 생명보험에서는 보험금 지급 여부를 심사하
는 사람을 30명 이상 해고하고, IBM Watson Explorer 에게
맡기기로 결정
•의료 기록을 바탕으로 Watson이 보험금 지급 여부를 판단
•인공지능으로 교체하여 생산성을 30% 향상
•2년 안에 ROI 가 나올 것이라고 예상
•1년차: 140m yen
•2년차: 200m yen
24. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
25.
26. 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100
90%
50%
10%
PT-AI
AGI
EETNTOP100 Combined
언제쯤 기계가 인간 수준의 지능을 획득할 것인가?
Philosophy and Theory of AI (2011)
Artificial General Intelligence (2012)
Greek Association for Artificial Intelligence
Survey of most frequently cited 100 authors (2013)
Combined
응답자
누적 비율
Superintelligence, Nick Bostrom (2014)
27. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
28. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
Q: 초인공지능이란 영역은 도달 가능한 것인가?
Q: 초지능을 가진 개체의 출현이 가능할 것이라고 생각하는가?
Table 1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Table 1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Q: 초지능의 실현이 일어나기를 희망하는가?
Table 1-1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
Complicated Complicated Complicated YES Complicated YES YES Complicated Complicated
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
31. Superintelligence, Nick Bostrom (2014)
일단 인간 수준(human baseline)의 강한 인공지능이 구현되면,
이후 초지능(superintelligence)로 도약(take off)하기까지는
극히 짧은 시간이 걸릴 수 있다.
How far to superintelligence
32. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
33.
34.
35.
36.
37.
38.
39.
40. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
인공지능의 의료 활용
41. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
인공지능의 의료 활용
44. 600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
45.
46.
47.
48.
49.
50. Annals of Oncology (2016) 27 (suppl_9): ix179-ix180. 10.1093/annonc/mdw601
Validation study to assess performance of IBM cognitive
computing system Watson for oncology with Manipal
multidisciplinary tumour board for 1000 consecutive cases:
An Indian experience
• MMDT(Manipal multidisciplinary tumour board) treatment recommendation and
data of 1000 cases of 4 different cancers breast (638), colon (126), rectum (124)
and lung (112) which were treated in last 3 years was collected.
• Of the treatment recommendations given by MMDT, WFO provided
50% in REC, 28% in FC, 17% in NREC
• Nearly 80% of the recommendations were in WFO REC and FC group
• 5% of the treatment provided by MMDT was not available with WFO
• The degree of concordance varied depending on the type of cancer
• WFO-REC was high in Rectum (85%) and least in Lung (17.8%)
• high with TNBC (67.9%); HER2 negative (35%)
• WFO took a median of 40 sec to capture, analyze and give the treatment.
(vs MMDT took the median time of 15 min)
51. San Antonio Breast Cancer Symposium—December 6-10, 2016
Concordance WFO (@T2) and MMDT (@T1* v. T2**)
(N= 638 Breast Cancer Cases)
Time Point
/Concordance
REC REC + FC
n % n %
T1* 296 46 463 73
T2** 381 60 574 90
This presentation is the intellectual property of the author/presenter.Contact somusp@yahoo.com for permission to reprint and/or distribute.26
* T1 Time of original treatment decision by MMDT in the past (last 1-3 years)
** T2 Time (2016) of WFO’s treatment advice and of MMDT’s treatment decision upon blinded re-review of non-concordant
cases
52. Sung Won Park,APFCP, 2017
Assessing the performance of Watson for Oncology using colon
cancer cases treated with surgery and adjuvant chemotherapy
at Gachon University Gil Medical Center
• Stage II with high risk and stage III colon cancer patients (N=162)
• Retrospective study: From September 1, 2014 to August 31, 2016
• Gachon University Gil Medical Center (GMC)
• Generally accepted by GMC-recommendation in 83.3%
• Concordant with
• WFO-Rec: 53.1%
• WFO-FC: 30.2%
• WFO-NREC: 13.0%
• Not included: 3.7%
53. WFO in ASCO 2017
• Concordance assessment of a cognitive computing system in Thailand (범룽랏)
• 2015-2016 patients 211명 (92명 retrospective; 119 prospective)
• Concordance
• Overall: 83%
• Colorectal 89%, lung 91%, breast 76%, Gastric 78%
54. WFO in ASCO 2017
• Early experience with IBM WFO cognitive computing system for lung
and colorectal cancer treatment (마니팔 병원)
• 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124)
• lung cancer: localized 88.9%, meta 97.9%
• colon cancer: localized 85.5%, meta 76.6%
• rectum cancer: localized 96.8%, meta 80.6%
55. WFO in ASCO 2017
• Early experience with IBM WFO cognitive computing system for lung
and colorectal cancer treatment (마니팔 병원)
• 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124)
• lung cancer: localized 88.9%, meta 97.9%
• colon cancer: localized 85.5%, meta 76.6%
• rectum cancer: localized 96.8%, meta 80.6%
Performance of WFO in India
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
56. WFO in ASCO 2017
•Use of a cognitive computing system for treatment of colon and gastric
cancer in South Korea (가천대길병원)
• 2012-2016
• 대장암 환자(stage II-IV) 340명
• 진행성 위암 환자 185명 (Retrospective)
• Concordance
• 대장암 환자 전체 (340명): 73%
• 보조 (adjuvant) 항암치료를 받은 250명: 85%
• 전이성 환자 90명: 40%
• 위암 환자 전체: 49%
• Trastzumab/FOLFOX 가 국민 건강 보험 수가를 받지 못함
• S-1(tegafur, gimeracil and oteracil)+cisplatin):
• 국내는 매우 루틴; 미국에서는 X
57. 잠정적 결론
•왓슨 포 온콜로지와 의사의 일치율:
•암종별로 다르다.
•같은 암종에서도 병기별로 다르다.
•같은 암종에 대해서도 병원별/국가별로 다르다.
•시간이 흐름에 따라 달라질 가능성이 있다.
58. WHY?
•국가별 가이드라인의 차이
• WFO는 기본적으로 MSKCC 기준
• 인종적 차이, 인허가 약물의 차이, 보험 제도의 차이
•NCCN 가이드라인의 업데이트
•암종별 치료 가능한 옵션의 다양성 차이
• 폐암: 다양함 vs 직장암: 다양하지 않음
• TNBC: 다양하지 않음 vs HER2 (-): 다양함
60. •WFO 의 정확성을 어떻게 증명해야 할까? 임상시험!
vs.
Watson for Oncology종양내과 전문의(들)
암환자 10,000명 암환자 10,000명
이러한 임상 시험은 가능할까?
• Prospective, single blind randomized trial
• Primary endpoint: overall survival (OS)
• Secondary endpoint: progression-free survival (PFS)
61. •WFO 의 정확성을 어떻게 증명해야 할까? 임상시험!
vs.
종양내과 전문의(들) Watson for Oncology
•WFO 만으로 환자를 진료하는 것이 비윤리적이다. (신약의 경우 전임상에서 검증)
•의사들의 실력이 heterogeneous 하므로 임상시험의 결과가 일반화되기 어렵다.
•왓슨은 계속 진화하므로, 과거의 임상시험 결과가 현재에 적용되기 어렵다.
암환자 10,000명 암환자 10,000명
• Prospective, single blind randomized trial
• Primary endpoint: overall survival (OS)
• Secondary endpoint: progression-free survival (PFS)
62. •WFO 의 정확성을 어떻게 증명해야 할까? 임상시험!
vs.
NCCN 가이드라인
NCCN 가이드라인
+ Watson for Oncology
이러한 임상 연구는 가능할 것이나, 결과 예측은 어렵다.
•WFO 만으로 환자를 진료하는 것이 비윤리적이다. (신약의 경우 전임상에서 검증)
•의사들의 실력이 heterogeneous 하므로 임상시험의 결과가 일반화되기 어렵다.
•왓슨은 계속 진화하므로, 과거의 임상시험 결과가 현재에 적용되기 어렵다.
•IBM에서 이 임상연구를 진행하기를 과연 원할 것인가?
암환자 10,000명 암환자 10,000명
• Prospective, single blind randomized trial
• Primary endpoint: overall survival (OS)
• Secondary endpoint: progression-free survival (PFS)
63. Meeting with Dr. Kyu Rhee,
Chief Health Officer of Watson Health
(2017. 7. 4.)
64. Four factors to validate
clinical benefits of WFO
•Patients Outcome
•human doctor vs human + WFO
•mortality and morbidity
•Cost (의료비용 절감)
•낮은 재발율 및 재입원율을 통한 비용 절감 효과가 있는가
•Doctor’s Satisfaction (의료진의 만족도)
•WFO을 이용함으로써 의사들의 진료 프로세스가 개선되는가?
•WFO를 사용하는 의료진의 사용자 경험은 어떠한가?
•Patients Satisfaction (환자의 만족도)
•환자들이 WFO을 사용하기를 원하는가?
65. 원칙이 필요하다
•어떤 환자의 경우, 왓슨에게 의견을 물을 것인가?
•왓슨을 (암종별로) 얼마나 신뢰할 것인가?
•왓슨의 의견을 환자에게 공개할 것인가?
•왓슨과 의료진의 판단이 다른 경우 어떻게 할 것인가?
•왓슨에게 보험 급여를 매길 수 있는가?
이러한 기준에 따라 의료의 질/치료효과가 달라질 수 있으나,
현재 개별 병원이 개별적인 기준으로 활용하게 됨
67. Empowering the Oncology Community for Cancer Care
Genomics
Oncology
Clinical
Trial
Matching
Watson Health’s oncology clients span more than 35 hospital systems
“Empowering the Oncology Community
for Cancer Care”
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
69. •총 16주간 HOG( Highlands Oncology Group)의 폐암과 유방암 환자 2,620명을 대상
•90명의 환자를 3개의 노바티스 유방암 임상 프로토콜에 따라 선별
•임상 시험 코디네이터: 1시간 50분
•Watson CTM: 24분 (78% 시간 단축)
•Watson CTM은 임상 시험 기준에 해당되지 않는 환자 94%를 자동으로 스크리닝
70. Watson Genomics Overview
20
Watson Genomics Content
• 20+ Content Sources Including:
• Medical Articles (23Million)
• Drug Information
• Clinical Trial Information
• Genomic Information
Case Sequenced
VCF / MAF, Log2, Dge
Encryption
Molecular Profile
Analysis
Pathway Analysis
Drug Analysis
Service Analysis, Reports, & Visualizations
75. IBM Watson Health
Organizations Leveraging Watson
Watson for Oncology
Best Doctors (second opinion)
Bumrungrad International Hospital
Confidential client (Bangladesh and Nepal)
Gachon University Gil Medical Center (Korea)
Hangzhou Cognitive Care – 50+ Chinese hospitals
Jupiter Medical Center
Manipal Hospitals – 16 Indian Hospitals
MD Anderson (**Oncology Expert Advisor)
Memorial Sloan Kettering Cancer Center
MRDM - Zorg (Netherlands)
Pusan National University Hospital
Clinical Trial Matching
Best Doctors (second opinion)
Confidential – Major Academic Center
Highlands Oncology Group
Froedtert & Medical College of Wisconsin
Mayo Clinic
Multiple Life Sciences pilots
24
Watson Genomic Analytics
Ann & Robert H Lurie Children’s Hospital of Chicago
BC Cancer Agency
City of Hope
Cleveland Clinic
Columbia University, Irwing Cancer Center
Duke Cancer Institute
Fred & Pamela Buffett Cancer Center
Fleury (Brazil)
Illumina 170 Gene Panel
NIH Japan
McDonnell Institute at Washington University in St. Louis
New York Genome Center
Pusan National University Hospital
Quest Diagnostics
Stanford Health
University of Kansas Cancer Center
University of North Carolina Lineberger Cancer Center
University of Southern California
University of Washington Medical Center
University of Tokyo
Yale Cancer Center
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
82. • 인공지능으로 인한 인간 의사의 권위 약화
• 환자의 자기 결정권 및 권익 증대
• 의사의 진료 방식 및 교육 방식의 변화 필요
http://news.donga.com/3/all/20170320/83400087/1
83. • 의사와 Watson의 판단이 다른 경우?
• NCCN 가이드라인과 다른 판단을 주기는 것으로 보임
• 100 여명 중에 5 case.
• 환자의 판단이 합리적이라고 볼 수 있는가?
• Watson의 정확도는 검증되지 않았음
• ‘제 4차 산업혁명’ 등의 buzz word의 영향으로 보임
• 임상 시험이 필요하지 않은가?
• 환자들의 선호는 인공지능의 adoption rate 에 영향
• 병원 도입에 영향을 미치는 요인들
• analytical validity
• clinical validity/utility
• 의사들의 인식/심리적 요인
• 환자들의 인식/심리적 요인
• 규제 환경 (인허가, 수가 등등)
• 결국 환자가 원하면 (그것이 의학적으로 타당한지를 떠나서)
병원 도입은 더욱 늘어날 수 밖에 없음
84. • Watson에 대한 환자 반응이 생각보다 매우 좋음
• 도입 2개월만에 85명 암 환자 진료
• 기존의 길병원 예측보다는 더 빠른 수치일 듯
• Big5 에서도 길병원으로 전원 문의 증가 한다는 후문
• 교수들이 더 열심히 상의하고 환자 본다고 함
85. • 부산대학병원 (2017년 1월)
• Watson의 솔루션 두 가지를 도입
• Watson for Oncology
• Watson for Genomics
86. • 건양대학병원 Watson for Oncology 도입
• 2017년 3월
• “최원준 건양대병원장은 "지역 환자들은 수도권의 여러
병원을 찾아다닐 필요가 없어질 것"이라며 "병원의 우수
한 협진 팀과 인공지능 의료 시스템의 시너지를 바탕으
로 암 환자에게 최상의 의료 서비스를 제공하겠다"고 약
속했다."
87.
88.
89. •“향후 10년 동안 첫번째 cardiovascular event 가 올 것인가” 예측
•전향적 코호트 스터디: 영국 환자 378,256 명
•일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디
•기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교
•Random forest; Logistic regression; Gradient bossting; Neural network
90. Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
in a sensitivity of 62.7% and PPV of 17.1%. The random forest algorithm resulted in a net
increase of 191 CVD cases from the baseline model, increasing the sensitivity to 65.3% and
PPV to 17.8% while logistic regression resulted in a net increase of 324 CVD cases (sensitivity
67.1%; PPV 18.3%). Gradient boosting machines and neural networks performed best, result-
ing in a net increase of 354 (sensitivity 67.5%; PPV 18.4%) and 355 CVD (sensitivity 67.5%;
PPV 18.4%) cases correctly predicted, respectively.
The ACC/AHA baseline model correctly predicted 53,106 non-cases from 75,585 total non-
cases, resulting in a specificity of 70.3% and NPV of 95.1%. The net increase in non-cases
Table 3. Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression),
weighting (neural networks), or selection frequency (random forest, gradient boosting machines). Algorithms were derived from training cohort of
295,267 patients.
ACC/AHA Algorithm Machine-learning Algorithms
Men Women ML: Logistic
Regression
ML: Random Forest ML: Gradient Boosting
Machines
ML: Neural Networks
Age Age Ethnicity Age Age Atrial Fibrillation
Total Cholesterol HDL Cholesterol Age Gender Gender Ethnicity
HDL Cholesterol Total Cholesterol SES: Townsend
Deprivation Index
Ethnicity Ethnicity Oral Corticosteroid
Prescribed
Smoking Smoking Gender Smoking Smoking Age
Age x Total Cholesterol Age x HDL Cholesterol Smoking HDL cholesterol HDL cholesterol Severe Mental Illness
Treated Systolic Blood
Pressure
Age x Total Cholesterol Atrial Fibrillation HbA1c Triglycerides SES: Townsend
Deprivation Index
Age x Smoking Treated Systolic Blood
Pressure
Chronic Kidney Disease Triglycerides Total Cholesterol Chronic Kidney Disease
Age x HDL Cholesterol Untreated Systolic
Blood Pressure
Rheumatoid Arthritis SES: Townsend
Deprivation Index
HbA1c BMI missing
Untreated Systolic
Blood Pressure
Age x Smoking Family history of
premature CHD
BMI Systolic Blood Pressure Smoking
Diabetes Diabetes COPD Total Cholesterol SES: Townsend
Deprivation Index
Gender
Italics: Protective Factors
https://doi.org/10.1371/journal.pone.0174944.t003
PLOS ONE | https://doi.org/10.1371/journal.pone.0174944 April 4, 2017 8 / 14
•기존 ACC/AHA 가이드라인의 위험 요소의 일부분만 기계학습 알고리즘에도 포함
•하지만, Diabetes는 네 모델 모두에서 포함되지 않았다.
•기존의 위험 예측 툴에는 포함되지 않던, 아래와 같은 새로운 요소들이 포함되었다.
•COPD, severe mental illness, prescribing of oral corticosteroids
•triglyceride level 등의 바이오 마커
91. Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
correctly predicted compared to the baseline ACC/AHA model ranged from 191 non-cases for
the random forest algorithm to 355 non-cases for the neural networks. Full details on classifi-
cation analysis can be found in S2 Table.
Discussion
Compared to an established AHA/ACC risk prediction algorithm, we found all machine-
learning algorithms tested were better at identifying individuals who will develop CVD and
those that will not. Unlike established approaches to risk prediction, the machine-learning
methods used were not limited to a small set of risk factors, and incorporated more pre-exist-
Table 4. Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying train-
ing algorithms on the validation cohort of 82,989 patients. Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA
10-year risk prediction algorithm is provided for comparative purposes.
Algorithms AUC c-statistic Standard Error* 95% Confidence
Interval
Absolute Change from Baseline
LCL UCL
BL: ACC/AHA 0.728 0.002 0.723 0.735 —
ML: Random Forest 0.745 0.003 0.739 0.750 +1.7%
ML: Logistic Regression 0.760 0.003 0.755 0.766 +3.2%
ML: Gradient Boosting Machines 0.761 0.002 0.755 0.766 +3.3%
ML: Neural Networks 0.764 0.002 0.759 0.769 +3.6%
*Standard error estimated by jack-knife procedure [30]
https://doi.org/10.1371/journal.pone.0174944.t004
Can machine-learning improve cardiovascular risk prediction using routine clinical data?
•네 가지 기계학습 모델 모두 기존의 ACC/AHA 가이드라인 대비 더 정확했다.
•Neural Networks 이 AUC=0.764 로 가장 정확했다.
•“이 모델을 활용했더라면 355 명의 추가적인 cardiovascular event 를 예방했을 것”
•Deep Learning 을 활용하면 정확도는 더 높아질 수 있을 것
•Genetic information 등의 추가적인 risk factor 를 활용해볼 수 있다.
92. 3
output
max-pooling
convolution --
motif detection
embedding
sequencing
medical record
visits/admissions
time gaps/transferphrase/admission
prediction
1
2
3
4
5
time gap
record
vector
word
vector
?
prediction point
Figure 1. Overview of Deepr for predicting future risk from medical record. Top-left box depicts an example of medical record with multiple visits, each of
which has multiple coded objects (diagnosis & procedure). The future risk is unknown (question mark (?)). Steps from-left-to-right: (1) Medical record is
sequenced into phrases separated by coded time-gaps/transfers; then from-bottom-to-top: (2) Words are embedded into continuous vectors, (3) local word
vectors are convoluted to detect local motifs, (4) max-pooling to derive record-level vector, (5) classifier is applied to predict an output, which is a future event.
Best viewed in color.
B. Sequencing EMR
This task refers to transforming an EMR into a sentence,
which is essentially a sequence of words. We present here how
the words are defined and arranged in the sentence.
procedures are in digits:
1910 Z83 911 1008 D12 K31 1-3m R94 RAREWORD H53
Y83 M62 Y92 E87 T81 RAREWORD RAREWORD 1893 D12
Deepr:A Convolutional Net for Medical Records
•“퇴원한 환자가 6개월 이내에 다시 입원할 것인가?” 예측
•EMR 데이터를 바탕으로 예측
•호주의 환자 30만명을 대상으로 검증
93. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
인공지능의 의료 활용
96. 12 Olga Russakovsky* et al.
Fig. 4 Random selection of images in ILSVRC detection validation set. The images in the top 4 rows were taken from
ILSVRC2012 single-object localization validation set, and the images in the bottom 4 rows were collected from Flickr using
scene-level queries.
tage of all the positive examples available. The second is images collected from Flickr specifically for the de- http://arxiv.org/pdf/1409.0575.pdf
97. • Main competition
• 객체 분류 (Classification): 그림 속의 객체를 분류
• 객체 위치 (localization): 그림 속 ‘하나’의 객체를 분류하고 위치를 파악
• 객체 인식 (object detection): 그림 속 ‘모든’ 객체를 분류하고 위치 파악
16 Olga Russakovsky* et al.
Fig. 7 Tasks in ILSVRC. The first column shows the ground truth labeling on an example image, and the next three show
three sample outputs with the corresponding evaluation score.
http://arxiv.org/pdf/1409.0575.pdf
98. Performance of winning entries in the ILSVRC2010-2015 competitions
in each of the three tasks
http://image-net.org/challenges/LSVRC/2015/results#loc
Single-object localization
Localizationerror
0
10
20
30
40
50
2011 2012 2013 2014 2015
Object detection
Averageprecision
0.0
17.5
35.0
52.5
70.0
2013 2014 2015
Image classification
Classificationerror
0
10
20
30
2010 2011 2012 2013 2014 2015
99.
100. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual Learning for Image Recognition”, 2015
How deep is deep?
104. DeepFace: Closing the Gap to Human-Level
Performance in FaceVerification
Taigman,Y. et al. (2014). DeepFace: Closing the Gap to Human-Level Performance in FaceVerification, CVPR’14.
Figure 2. Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three
locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million
parameters, where more than 95% come from the local and fully connected layers.
very few parameters. These layers merely expand the input
into a set of simple local features.
The subsequent layers (L4, L5 and L6) are instead lo-
cally connected [13, 16], like a convolutional layer they ap-
ply a filter bank, but every location in the feature map learns
a different set of filters. Since different regions of an aligned
image have different local statistics, the spatial stationarity
The goal of training is to maximize the probability of
the correct class (face id). We achieve this by minimiz-
ing the cross-entropy loss for each training sample. If k
is the index of the true label for a given input, the loss is:
L = log pk. The loss is minimized over the parameters
by computing the gradient of L w.r.t. the parameters and
Human: 95% vs. DeepFace in Facebook: 97.35%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
105. FaceNet:A Unified Embedding for Face
Recognition and Clustering
Schroff, F. et al. (2015). FaceNet:A Unified Embedding for Face Recognition and Clustering
Human: 95% vs. FaceNet of Google: 99.63%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
False accept
False reject
s. This shows all pairs of images that were
on LFW. Only eight of the 13 errors shown
he other four are mislabeled in LFW.
on Youtube Faces DB
ge similarity of all pairs of the first one
our face detector detects in each video.
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
lead to truly amazing results. Figure 7 shows one cluster in
a users personal photo collection, generated using agglom-
erative clustering. It is a clear showcase of the incredible
invariance to occlusion, lighting, pose and even age.
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
sification. Our end-to-end training both simplifies the setup
and shows that directly optimizing a loss relevant to the task
at hand improves performance.
Another strength of our model is that it only requires
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
106. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
v
om
Samy Bengio
Google
bengio@google.com
Dumitru Erhan
Google
dumitru@google.com
s a
cts
his
re-
m-
ed
he
de-
nts
A group of people
shopping at an
outdoor market.
!
There are many
vegetables at the
fruit stand.
Vision!
Deep CNN
Language !
Generating!
RNN
Figure 1. NIC, our model, is based end-to-end on a neural net-
work consisting of a vision CNN followed by a language gener-
107. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
Figure 5. A selection of evaluation results, grouped by human rating.
112. Business Area
Medical Image Analysis
VUNOnet and our machine learning technology will help doctors and hospitals manage
medical scans and images intelligently to make diagnosis faster and more accurately.
Original Image Automatic Segmentation EmphysemaNormal ReticularOpacity
Our system finds DILDs at the highest accuracy * DILDs: Diffuse Interstitial Lung Disease
Digital Radiologist
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
114. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
115. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
116. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
Feature Engineering vs Feature Learning
alization of Hand-crafted Feature vs Learned Feature in 2D
Feature Engineering vs Feature Learning
• Visualization of Hand-crafted Feature vs Learned Feature in 2D
Visualization of Hand-crafted Feature vs Learned Feature in 2D
117. Bench to Bedside : Practical Applications
• Contents-based Case Retrieval
–Finding similar cases with the clinically matching context - Search engine for medical images.
–Clinicians can refer the diagnosis, prognosis of past similar patients to make better clinical decision.
–Accepted to present at RSNA 2017
Digital Radiologist
119. •Zebra Medical Vision에서 $1 에 영상의학데이터를 판독해주는 서비스를 런칭 (2017년 10월)
•항목은 확정되지는 않았으나, Pulmonary Hypertension, Lung Nodule, Fatty Liver, Emphysema,
Coronary Calcium Scoring, Bone Mineral Density, Aortic Aneurysm 등으로 예상
https://www.zebra-med.com/aione/
120. Zebra Medical Vision’s AI1: AI at Your Fingertips
https://www.youtube.com/watch?v=0PGgCpXa-Fs
122. 당뇨성 망막병증
• 당뇨병의 대표적 합병증: 당뇨병력이 30년 이상 환자 90% 발병
• 안과 전문의들이 안저(안구의 안쪽)를 사진으로 찍어서 판독
• 망막 내 미세혈관 생성, 출혈, 삼출물 정도를 파악하여 진단
123. Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
126. Training Set / Test Set
• CNN으로 후향적으로 128,175개의 안저 이미지 학습
• 미국의 안과전문의 54명이 3-7회 판독한 데이터
• 우수한 안과전문의들 7-8명의 판독 결과와 인공지능의 판독 결과 비교
• EyePACS-1 (9,963 개), Messidor-2 (1,748 개)a) Fullscreen mode
b) Hit reset to reload this image. This will reset all of the grading.
c) Comment box for other pathologies you see
eFigure 2. Screenshot of the Second Screen of the Grading Tool, Which Asks Graders to Assess the
Image for DR, DME and Other Notable Conditions or Findings
127. • EyePACS-1 과 Messidor-2 의 AUC = 0.991, 0.990
• 7-8명의 안과 전문의와 sensitivity, specificity 가 동일한 수준
• F-score: 0.95 (vs. 인간 의사는 0.91)
Additional sensitivity analyses were conducted for sev-
eralsubcategories:(1)detectingmoderateorworsediabeticreti-
effects of data set size on algorithm performance were exam-
ined and shown to plateau at around 60 000 images (or ap-
Figure 2. Validation Set Performance for Referable Diabetic Retinopathy
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
EyePACS-1: AUC, 99.1%; 95% CI, 98.8%-99.3%A
100
High-sensitivity operating point
High-specificity operating point
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
Messidor-2: AUC, 99.0%; 95% CI, 98.6%-99.5%B
100
High-specificity operating point
High-sensitivity operating point
Performance of the algorithm (black curve) and ophthalmologists (colored
circles) for the presence of referable diabetic retinopathy (moderate or worse
diabetic retinopathy or referable diabetic macular edema) on A, EyePACS-1
(8788 fully gradable images) and B, Messidor-2 (1745 fully gradable images).
The black diamonds on the graph correspond to the sensitivity and specificity of
the algorithm at the high-sensitivity and high-specificity operating points.
In A, for the high-sensitivity operating point, specificity was 93.4% (95% CI,
92.8%-94.0%) and sensitivity was 97.5% (95% CI, 95.8%-98.7%); for the
high-specificity operating point, specificity was 98.1% (95% CI, 97.8%-98.5%)
and sensitivity was 90.3% (95% CI, 87.5%-92.7%). In B, for the high-sensitivity
operating point, specificity was 93.9% (95% CI, 92.4%-95.3%) and sensitivity
was 96.1% (95% CI, 92.4%-98.3%); for the high-specificity operating point,
specificity was 98.5% (95% CI, 97.7%-99.1%) and sensitivity was 87.0% (95%
CI, 81.1%-91.0%). There were 8 ophthalmologists who graded EyePACS-1 and 7
ophthalmologists who graded Messidor-2. AUC indicates area under the
receiver operating characteristic curve.
Research Original Investigation Accuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
Results
131. LETTERH
his task, the CNN achieves 72.1±0.9% (mean±s.d.) overall
he average of individual inference class accuracies) and two
gists attain 65.56% and 66.0% accuracy on a subset of the
set. Second, we validate the algorithm using a nine-class
rtition—the second-level nodes—so that the diseases of
have similar medical treatment plans. The CNN achieves
two trials, one using standard images and the other using
images, which reflect the two steps that a dermatologist m
to obtain a clinical impression. The same CNN is used for a
Figure 2b shows a few example images, demonstrating th
distinguishing between malignant and benign lesions, whic
visual features. Our comparison metrics are sensitivity an
Acral-lentiginous melanoma
Amelanotic melanoma
Lentigo melanoma
…
Blue nevus
Halo nevus
Mongolian spot
…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
Convolution
AvgPool
MaxPool
Concat
Dropout
Fully connected
Softmax
Deep CNN layout. Our classification technique is a
Data flow is from left to right: an image of a skin lesion
e, melanoma) is sequentially warped into a probability
over clinical classes of skin disease using Google Inception
hitecture pretrained on the ImageNet dataset (1.28 million
1,000 generic object classes) and fine-tuned on our own
29,450 skin lesions comprising 2,032 different diseases.
ning classes are defined using a novel taxonomy of skin disease
oning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melano
melanoma). Inference classes are more general and are comp
or more training classes (for example, malignant melanocytic
class of melanomas). The probability of an inference class is c
summing the probabilities of the training classes according to
structure (see Methods). Inception v3 CNN architecture repr
from https://research.googleblog.com/2016/03/train-your-ow
classifier-with.html
GoogleNet Inception v3
• 129,450개의 피부과 병변 이미지 데이터를 자체 제작
• 미국의 피부과 전문의 18명이 데이터 curation
• CNN (Inception v3)으로 이미지를 학습
• 피부과 전문의들 21명과 인공지능의 판독 결과 비교
• 표피세포 암 (keratinocyte carcinoma)과 지루각화증(benign seborrheic keratosis)의 구분
• 악성 흑색종과 양성 병변 구분 (표준 이미지 데이터 기반)
• 악성 흑색종과 양성 병변 구분 (더마토스코프로 찍은 이미지 기반)
132. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
21명 중에 인공지능보다 정확성이 떨어지는 피부과 전문의들이 상당수 있었음
피부과 전문의들의 평균 성적도 인공지능보다 좋지 않았음
133. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
134. Skin Cancer Image Classification (TensorFlow Dev Summit 2017)
Skin cancer classification performance of
the CNN and dermatologists.
https://www.youtube.com/watch?v=toK1OSLep3s&t=419s
136. Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
A B DC
Benign without atypia / Atypic / DCIS (ductal carcinoma in situ) / Invasive Carcinoma
Interpretation?
Elmore etl al. JAMA 2015
137. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
138. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
139. Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
• Concordance noted in 5194 of 6900 case interpretations or 75.3%.
• Reference diagnosis was obtained from consensus of 3 experienced breast pathologists.
spentonthisactivitywas16(95%CI,15-17);43participantswere
awarded the maximum 20 hours.
Pathologists’ Diagnoses Compared With Consensus-Derived
Reference Diagnoses
The 115 participants each interpreted 60 cases, providing 6900
total individual interpretations for comparison with the con-
sensus-derived reference diagnoses (Figure 3). Participants
agreed with the consensus-derived reference diagnosis for
75.3% of the interpretations (95% CI, 73.4%-77.0%). Partici-
pants (n = 94) who completed the CME activity reported that
Patient and Pathologist Characteristics Associated With
Overinterpretation and Underinterpretation
The association of breast density with overall pathologists’
concordance (as well as both overinterpretation and under-
interpretation rates) was statistically significant, as shown
in Table 3 when comparing mammographic density grouped
into 2 categories (low density vs high density). The overall
concordance estimates also decreased consistently with
increasing breast density across all 4 Breast Imaging-
Reporting and Data System (BI-RADS) density categories:
BI-RADS A, 81% (95% CI, 75%-86%); BI-RADS B, 77% (95%
Figure 3. Comparison of 115 Participating Pathologists’ Interpretations vs the Consensus-Derived Reference
Diagnosis for 6900 Total Case Interpretationsa
Participating Pathologists’ Interpretation
ConsensusReference
Diagnosisb
Benign
without atypia Atypia DCIS
Invasive
carcinoma Total
Benign without atypia 1803 200 46 21 2070
Atypia 719 990 353 8 2070
DCIS 133 146 1764 54 2097
Invasive carcinoma 3 0 23 637 663
Total 2658 1336 2186 720 6900
DCIS indicates ductal carcinoma
in situ.
a
Concordance noted in 5194 of
6900 case interpretations or
75.3%.
b
Reference diagnosis was obtained
from consensus of 3 experienced
breast pathologists.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Comparison of 115 Participating Pathologists’ Interpretations vs
the Consensus-Derived Reference Diagnosis for 6900 Total Case Interpretations
140. Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Building an epithelial/stromal classifier:
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
B
Basic image processing and feature construction:
H&E image Image broken into superpixels Nuclei identified within
each superpixel
A
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients Processed images from patients
C
D
onNovember17,2011stm.sciencemag.orgwnloadedfrom
TMAs contain 0.6-mm-diameter cores (median
of two cores per case) that represent only a small
sample of the full tumor. We acquired data from
two separate and independent cohorts: Nether-
lands Cancer Institute (NKI; 248 patients) and
Vancouver General Hospital (VGH; 328 patients).
Unlike previous work in cancer morphom-
etry (18–21), our image analysis pipeline was
not limited to a predefined set of morphometric
features selected by pathologists. Rather, C-Path
measures an extensive, quantitative feature set
from the breast cancer epithelium and the stro-
ma (Fig. 1). Our image processing system first
performed an automated, hierarchical scene seg-
mentation that generated thousands of measure-
ments, including both standard morphometric
descriptors of image objects and higher-level
contextual, relational, and global image features.
The pipeline consisted of three stages (Fig. 1, A
to C, and tables S8 and S9). First, we used a set of
processing steps to separate the tissue from the
background, partition the image into small regions
of coherent appearance known as superpixels,
find nuclei within the superpixels, and construct
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients
alive at 5 years
Processed images from patients
deceased at 5 years
L1-regularized
logisticregression
modelbuilding
5YS predictive model
Unlabeled images
Time
P(survival)
C
D
Identification of novel prognostically
important morphologic features
basic cellular morphologic properties (epithelial reg-
ular nuclei = red; epithelial atypical nuclei = pale blue;
epithelial cytoplasm = purple; stromal matrix = green;
stromal round nuclei = dark green; stromal spindled
nuclei = teal blue; unclassified regions = dark gray;
spindled nuclei in unclassified regions = yellow; round
nuclei in unclassified regions = gray; background =
white). (Left panel) After the classification of each
image object, a rich feature set is constructed. (D)
Learning an image-based model to predict survival.
Processed images from patients alive at 5 years after
surgery and from patients deceased at 5 years after
surgery were used to construct an image-based prog-
nostic model. After construction of the model, it was
applied to a test set of breast cancer images (not
used in model building) to classify patients as high
or low risk of death by 5 years.
www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
141. Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Top stromal features associated with survival.
primarily characterizing epithelial nuclear characteristics, such as
size, color, and texture (21, 36). In contrast, after initial filtering of im-
ages to ensure high-quality TMA images and training of the C-Path
models using expert-derived image annotations (epithelium and
stroma labels to build the epithelial-stromal classifier and survival
time and survival status to build the prognostic model), our image
analysis system is automated with no manual steps, which greatly in-
creases its scalability. Additionally, in contrast to previous approaches,
our system measures thousands of morphologic descriptors of diverse
identification of prognostic features whose significance was not pre-
viously recognized.
Using our system, we built an image-based prognostic model on
the NKI data set and showed that in this patient cohort the model
was a strong predictor of survival and provided significant additional
prognostic information to clinical, molecular, and pathological prog-
nostic factors in a multivariate model. We also demonstrated that the
image-based prognostic model, built using the NKI data set, is a strong
prognostic factor on another, independent data set with very different
SD of the ratio of the pixel intensity SD to the mean intensity
for pixels within a ring of the center of epithelial nuclei
A
The sum of the number of unclassified objects
SD of the maximum blue pixel value for atypical epithelial nuclei
Maximum distance between atypical epithelial nuclei
B
C
D
Maximum value of the minimum green pixel intensity value in
epithelial contiguous regions
Minimum elliptic fit of epithelial contiguous regions
SD of distance between epithelial cytoplasmic and nuclear objects
Average border between epithelial cytoplasmic objects
E
F
G
H
Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap anal-
ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD
of the (SD of intensity/mean intensity) for pixels within a ring of the center
of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low
score); right, great nuclear intensity diversity (high score). (B) Sum of the
number of unclassified objects. Red, epithelial regions; green, stromal re-
gions; no overlaid color, unclassified region. Left, few unclassified objects
(low score); right, higher number of unclassified objects (high score). (C) SD
of the maximum blue pixel value for atypical epithelial nuclei. Left, high
score; right, low score. (D) Maximum distance between atypical epithe-
lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial
nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial
contiguous regions. Left, high score; right, low score. (F) SD of distance
between epithelial cytoplasmic and nuclear objects. Left, high score; right,
low score. (G) Average border between epithelial cytoplasmic objects. Left,
high score; right, low score. (H) Maximum value of the minimum green
pixel intensity value in epithelial contiguous regions. Left, low score indi-
cating black pixels within epithelial region; right, higher score indicating
presence of epithelial regions lacking black pixels.
onNovember17,2011stm.sciencemag.orgDownloadedfrom
and stromal matrix throughout the image, with thin cords of epithe-
lial cells infiltrating through stroma across the image, so that each
stromal matrix region borders a relatively constant proportion of ep-
ithelial and stromal regions. The stromal feature with the second
largest coefficient (Fig. 4B) was the sum of the minimum green in-
tensity value of stromal-contiguous regions. This feature received a
value of zero when stromal regions contained dark pixels (such as
inflammatory nuclei). The feature received a positive value when
stromal objects were devoid of dark pixels. This feature provided in-
formation about the relationship between stromal cellular composi-
tion and prognosis and suggested that the presence of inflammatory
cells in the stroma is associated with poor prognosis, a finding con-
sistent with previous observations (32). The third most significant
stromal feature (Fig. 4C) was a measure of the relative border between
spindled stromal nuclei to round stromal nuclei, with an increased rel-
ative border of spindled stromal nuclei to round stromal nuclei asso-
ciated with worse overall survival. Although the biological underpinning
of this morphologic feature is currently not known, this analysis sug-
gested that spatial relationships between different populations of stro-
mal cell types are associated with breast cancer progression.
Reproducibility of C-Path 5YS model predictions on
samples with multiple TMA cores
For the C-Path 5YS model (which was trained on the full NKI data
set), we assessed the intrapatient agreement of model predictions when
predictions were made separately on each image contributed by pa-
tients in the VGH data set. For the 190 VGH patients who contributed
two images with complete image data, the binary predictions (high
or low risk) on the individual images agreed with each other for 69%
(131 of 190) of the cases and agreed with the prediction on the aver-
aged data for 84% (319 of 380) of the images. Using the continuous
prediction score (which ranged from 0 to 100), the median of the ab-
solute difference in prediction score among the patients with replicate
images was 5%, and the Spearman correlation among replicates was
0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is
only moderate, and these findings suggest significant intrapatient tumor
heterogeneity, which is a cardinal feature of breast carcinomas (33–35).
Qualitative visual inspection of images receiving discordant scores
suggested that intrapatient variability in both the epithelial and the
stromal components is likely to contribute to discordant scores for
the individual images. These differences appeared to relate both to
the proportions of the epithelium and stroma and to the appearance
of the epithelium and stroma. Last, we sought to analyze whether sur-
vival predictions were more accurate on the VGH cases that contributed
multiple cores compared to the cases that contributed only a single
core. This analysis showed that the C-Path 5YS model showed signif-
icantly improved prognostic prediction accuracy on the VGH cases
for which we had multiple images compared to the cases that con-
tributed only a single image (Fig. 7). Together, these findings show
a significant degree of intrapatient variability and indicate that increased
tumor sampling is associated with improved model performance.
DISCUSSION
Heat map of stromal matrix
objects mean abs.diff
to neighbors
H&E image separated
into epithelial and
stromal objects
A
B
C
Worse
prognosis
Improved
prognosis
Improved
prognosis
Improved
prognosis
Worse
prognosis
Worse
prognosis
Fig. 4. Top stromal features associated with survival. (A) Variability in ab-
solute difference in intensity between stromal matrix regions and neigh-
bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets)
Top panel, high score; bottom panel; low score. Right panels, stromal matrix
objects colored blue (low), green (medium), or white (high) according to
each object’s absolute difference in intensity to neighbors. (B) Presence
R E S E A R C H A R T I C L E
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Top epithelial features.The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap
anal- ysis. Left panels, improved prognosis; right panels, worse prognosis.
147. Clinical study on ISBI dataset
Error Rate
Pathologist in competition setting 3.5%
Pathologists in clinical practice (n = 12) 13% - 26%
Pathologists on micro-metastasis(small tumors) 23% - 42%
Beck Lab Deep Learning Model 0.65%
Beck Lab’s deep learning model now outperforms pathologist
Andrew Beck, Machine Learning for Healthcare, MIT 2017
149. Assisting Pathologists in Detecting
Cancer with Deep Learning
• The localization score(FROC) for the algorithm reached 89%, which significantly
exceeded the score of 73% for a pathologist with no time constraint.
150. Assisting Pathologists in Detecting
Cancer with Deep Learning
• Algorithms need to be incorporated in a way that complements the pathologist’s workflow.
• Algorithms could improve the efficiency and consistency of pathologists.
• For example, pathologists could reduce their false negative rates (percentage of
undetected tumors) by reviewing the top ranked predicted tumor regions
including up to 8 false positive regions per slide.
151. Assisting Pathologists in Detecting
Cancer with Deep Learning
6
Input & Validation Test
model size FROC @8FP AUC FROC @8FP AUC
40X 98.1 100 99.0 87.3 (83.2, 91.1) 91.1 (87.2, 94.5) 96.7 (92.6, 99.6)
40X-pretrained 99.3 100 100 85.5 (81.0, 89.5) 91.1 (86.8, 94.6) 97.5 (93.8, 99.8)
40X-small 99.3 100 100 86.4 (82.2, 90.4) 92.4 (88.8, 95.7) 97.1 (93.2, 99.8)
ensemble-of-3 - - - 88.5 (84.3, 92.2) 92.4 (88.7, 95.6) 97.7 (93.0, 100)
20X-small 94.7 100 99.6 85.5 (81.0, 89.7) 91.1 (86.9, 94.8) 98.6 (96.7, 100)
10X-small 88.7 97.2 97.7 79.3 (74.2, 84.1) 84.9 (80.0, 89.4) 96.5 (91.9, 99.7)
40X+20X-small 94.9 98.6 99.0 85.9 (81.6, 89.9) 92.9 (89.3, 96.1) 97.0 (93.1, 99.9)
40X+10X-small 93.8 98.6 100 82.2 (77.0, 86.7) 87.6 (83.2, 91.7) 98.6 (96.2, 99.9)
Pathologist [1] - - - 73.3* 73.3* 96.6
Camelyon16 winner [1, 23] - - - 80.7 82.7 99.4
Table 1. Results on Camelyon16 dataset (95% confidence intervals, CI). Bold indicates
results within the CI of the best model. “Small” models contain 300K parameters per
Inception tower instead of 20M. -: not reported. *A pathologist achieved this sensitivity
(with no FP) using 30 hours.
to 10 20% variance), and can confound evaluation of model improvements
by grouping multiple nearby tumors as one. By contrast, our non-maxima sup-
pression approach is relatively insensitive to r between 4 and 6, although less
accurate models benefited from tuning r using the validation set (e.g., 8). Fi-
The FROC evaluates tumor detection and localization
The FROC is defined as the sensitivity at 0.25,0.5,1,2,4,8 average FPs per tumor-negative slide.
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
Sensitivity at 8 false positives per image
152. Assisting Pathologists in Detecting
Cancer with Deep Learning
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
• 구글의 인공지능은 @8FP 및 FROC에서 큰 개선 (92.9%, 88.5%)
•@8FP: FP를 8개까지 봐주면서, 달성할 수 있는 sensitivity
•FROC: FP를 슬라이드당 1/4, 1/2, 1, 2, 4, 8개를 허용한 민감도의 평균
•즉, FP를 조금 봐준다면, 인공지능은 매우 높은 민감도를 달성
•인간 병리학자는 민감도 73%에 반해 특이도는 거의 100% 달성
•인간 병리학자와 인공지능 병리학자는 서로 잘하는 것이 다름
•양쪽이 협력하면 판독 효율성, 일관성, 민감도 등에서 개선 기대 가능
157. Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
158. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
인공지능의 의료 활용
163. S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
R E S E A R C H A R T I C L E
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
164. puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17)
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85
and identified patients a median of 28.2 hours before onset.
167. In an early research project involving 600 patient cases, the team was able to
predict near-term hypoglycemic events up to 3 hours in advance of the symptoms.
IBM Watson-Medtronic
Jan 7, 2016
168. Sugar.IQ
사용자의 음식 섭취와 그에 따른 혈당 변
화, 인슐린 주입 등의 과거 기록 기반
식후 사용자의 혈당이 어떻게 변화할지
Watson 이 예측
169. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
170. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
171. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
172. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
174. Prediction ofVentricular Arrhythmia
Collaboration with Prof. Segyeong Joo (Asan Medical Center)
Analysed “Physionet Spontaneous Ventricular Tachyarrhythmia Database” for 2.5 months (on going project)
Joo S, Choi KJ, Huh SJ, 2012, Expert Systems with Applications (Vol 39, Issue 3)
▪ Recurrent Neural Network with Only Frequency Domain Transform
• Input : Spectrogram with 129 features obtained after ectopic beats removal
• Stack of LSTM Networks
• Binary cross-entropy loss
• Trained with RMSprop
• Prediction Accuracy : 76.6% ➞ 89.6%
Dropout
Dropout
175. Prediction ofVentricular
TachycardiaOne Hour before
Occurrence UsingArtificial
Neural Networks
Hyojeong Lee1,*
, Soo-Yong Shin2,*
, Myeongsook Seo3
,Gi-Byoung Nam3
& Segyeong Joo1,4
Ventricular tachycardia (VT) is a potentially fatal tachyarrhythmia, which causes a rapid heartbeat as
a result of improper electrical activity of the heart.This is a potentially life-threatening arrhythmia
because it can cause low blood pressure and may lead to ventricular fibrillation, asystole, and sudden
cardiac death.To preventVT, we developed an early prediction model that can predict this event one
hour before its onset using an artificial neural network (ANN) generated using 14 parameters obtained
from heart rate variability (HRV) and respiratory rate variability (RRV) analysis. De-identified raw
data from the monitors of patients admitted to the cardiovascular intensive care unit atAsan Medical
Center between September 2013 andApril 2015 were collected.The dataset consisted of 52 recordings
obtained one hour prior toVT events and 52 control recordings.Two-thirds of the extracted parameters
were used to train theANN, and the remaining third was used to evaluate performance of the learned
ANN.The developedVT prediction model proved its performance by achieving a sensitivity of 0.88,
specificity of 0.82, andAUC of 0.93.
Sudden cardiac death (SCD) causes more than 300,000 deaths annually in the United States1
. Coronary artery
disease, cardiomyopathy, structural heart problems, Brugada syndrome, and long QT syndrome are well known
causes of SCD1–4
. In addition, spontaneous ventricular tachyarrhythmia (VTA) is a main cause of SCD, contrib-
uting to about 80% of SCDs5
. Ventricular tachycardia (VT) and ventricular fibrillation (VF) comprise VTA. VT
is defined as a very rapid heartbeat (more than 100 times per minute), which does not allow enough time for the
ventricles to fill with blood between beats. VT may terminate spontaneously after a few seconds; however, in some
cases, VT can progress to more dangerous or fatal arrhythmia, VF. Accordingly, early prediction of VT will help
in reducing mortality from SCD by allowing for preventive care of VTA.
Several studies have reported attempts at predicting VTAs by assessing the occurrence of syncope, left ventricu-
lar systolic dysfunction, QRS (Q, R, and S wave in electrocardiogram) duration, QT (Q and T wave) dispersion,
Holter monitoring, signal averaged electrocardiograms (ECGs), heart rate variability (HRV), T wave alternans,
electrophysiologic testing, B-type natriuretic peptides, and other parameters or method6–10
. Among these studies,
prediction of VTAs based on HRV analysis has recently emerged and shown potential for predicting VTA11–13
.
Previous studies have focused on the prediction of VT using HRV analysis. In addition, most studies assessed
the statistical value of each parameter calculated on or prior to the VT event and parameters of control data,
which were collected from Holter recordings and implantable cardioverter defibrillators (ICDs)12,14,15
. However,
the results were not satisfactory in predicting fatal events like VT.
To make a better prediction model of VT, it is essential to utilize multiple parameters from various methods
of HRV analysis and to generate a classifier that can deal with complex patterns composed of such parameters7
.
Artificial neural network (ANN) is a valuable tool for classification of a database with multiple parameters. ANN
is a kind of machine learning algorithms, which can be trained using data with multiple parameters16
. After
training, the ANN calculates an output value according to the input parameters, and this output value can be used
1
Department of Biomedical Engineering, University of Ulsan College of Medicine, Seoul, Republic of Korea.
2
Department of Biomedical Informatics, Asan Medical Center, Seoul, Republic of Korea. 3
Department of Internal
Re e e : 26 pr 2016
A ep e : 03 s 2016
P s e : 26 s 2016
OPEN
Lee H. et al, Scientific Report, 2016
176. Prediction of Ventricular Tachycardia One Hour before
Occurrence Using Artificial Neural Networks
ww.nature.com/scientificreports/
in pattern recognition or classification. ANN has not been widely used in medical analysis since the algorithm
is not intuitive for physicians. However, utilization of ANN in medical research has recently emerged17–19
. Our
Parameters
Control dataset (n=110) VTs dataset (n=110)
Mean±SD Mean±SD p-Value
Mean NN (ms) 0.709±0.149 0.718±0.158 0.304
SDNN (ms) 0.061±0.042 0.073±0.045 0.013
RMSSD (ms) 0.068±0.053 0.081±0.057 0.031
pNN50 (%) 0.209±0.224 0.239±0.205 0.067
VLF (ms2
) 4.1E-05±6.54E-05 6.23E-05±9.81E-05 0.057
LF (ms2
) 7.61E-04±1.16E-03 1.04E-03±1.15E-03 0.084
HF (ms2
) 1.53E-03±2.02E-03 1.96E-03±2.16E-03 0.088
LF/HF 0.498±0.372 0.533±0.435 0.315
SD1 (ms) 0.039±0.029 0.047±0.032 0.031
SD2 (ms) 0.081±0.057 0.098±0.06 0.012
SD1/SD2 0.466±0.169 0.469±0.164 0.426
RPdM (ms) 2.73±0.817 2.95±0.871 0.038
RPdSD (ms) 0.721±0.578 0.915±0.868 0.075
RPdV 28.4±5.31 25.4±3.56 <0.002
Table 1. Comparison of HRV and RRV parameters between the control and VT dataset.
ANN with Input Sensitivity (%) Specificity (%) Accuracy (%) PPV (%) NPV (%) AUC
HRV parameters 11 70.6(12/17) 76.5(13/17) 73.5(25/34) 75.0(12/16) 72.2(13/18) 0.75
RRV parameters 3 82.4(14/17) 82.4(14/17) 82.4(28/34) 82.4(14/17) 82.4(14/17) 0.83
HRV+RRV parameters 14 88.2(15/17) 82.4(14/17) 85.3(29/34) 83.3(15/18) 87.5(14/16) 0.93
Table 2. Performance of three ANNs in predicting a VT event 1hour before onset for the test dataset.
Lee H. et al, Scientific Report, 2016
This ANN with 13 hidden
neurons in one hidden
layer showed the best
performance.
177. www.nature.com/scientificreports/
Discussion
Figure 1. ROC curve of three ANNs (dashed line, with only HRV parameters; dashdot line, with
parameters; solid line, with HRV and RRV parameters; dotted line, reference) used in the predict
VT event one hour before onset.
ROC curve of three ANNs (dashed line, with only HRV parameters; dashdot line, with
only RRV parameters; solid line, with HRV and RRV parameters; dotted line, reference)
used in the prediction of aVT event one hour before onset.
Prediction of Ventricular Tachycardia One Hour before
Occurrence Using Artificial Neural Networks
Lee H. et al, Scientific Report, 2016
178. •아주대병원 외상센터, 응급실, 내과계 중환자실 등 3곳의 80개 병상
•산소포화도, 혈압, 맥박, 뇌파, 체온 등 8가지 환자 생체 데이터를 하나로 통합 저장
•생체 정보를 인공지능으로 실시간 모니터링+분석하여 1-3시간 전에 예측
•부정맥, 패혈증, 급성호흡곤란증후군(ARDS), 계획되지 않은 기도삽관 등의 질병
183. Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks
Pranav Rajpurkar⇤
PRANAVSR@CS.STANFORD.EDU
Awni Y. Hannun⇤
AWNI@CS.STANFORD.EDU
Masoumeh Haghpanahi MHAGHPANAHI@IRHYTHMTECH.COM
Codie Bourn CBOURN@IRHYTHMTECH.COM
Andrew Y. Ng ANG@CS.STANFORD.EDU
Abstract
We develop an algorithm which exceeds the per-
formance of board certified cardiologists in de-
tecting a wide range of heart arrhythmias from
electrocardiograms recorded with a single-lead
wearable monitor. We build a dataset with more
than 500 times the number of unique patients
than previously studied corpora. On this dataset,
we train a 34-layer convolutional neural network
which maps a sequence of ECG samples to a se-
quence of rhythm classes. Committees of board-
certified cardiologists annotate a gold standard
test set on which we compare the performance of
our model to that of 6 other individual cardiolo-
gists. We exceed the average cardiologist perfor-
mance in both recall (sensitivity) and precision
(positive predictive value).
1. Introduction
We develop a model which can diagnose irregular heart
Figure 1. Our trained convolutional neural network correctly de-
tecting the sinus rhythm (SINUS) and Atrial Fibrillation (AFIB)
from this ECG recorded with a single-lead wearable heart moni-
tor.
Arrhythmia detection from ECG recordings is usually per-
formed by expert technicians and cardiologists given the
1707.01836v1[cs.CV]6Jul2017
185. Cardiologist-Level Arrhythmia Detection
with Convolutional Neural Networks
• Training Set
• 약 3만 명의 환자에게서 얻은 64,000여 건의 심전도 데이터
• 34층 깊이의 CNN으로 학습
• Test Set
• 336명에게서 얻은 Zio Patch의 심전도 데이터
• 세 명의 심장내과 전문의들이 상의하여 정답
• 총 12가지 종류의 부정맥으로 분류
• 총 6명의 심장내과 전문의들 vs 인공지능
• 부정맥의 발생 여부
• 부정맥 종류
186. Cardiologist-Level Arrhythmia Detection
with Convolutional Neural Networks
Cardiologist-Level Arrhythmia Detection with Convoluti
Figure 3. Evaluated on the test set, the model outperforms the
Patch which h
et al., 2013).
seconds long
Each record is
pert highlights
responding to
The 30 second
ECG annotatio
tations were d
Technicians w
rhythmia detec
ination by Car
technicians we
could annotate
labeled from t