You're correct. FaceNet, developed by Google, achieved 99.63% accuracy on the Labeled Faces in the Wild (LFW) dataset, significantly higher than both DeepFace (97.35%) and the original baseline (95%). Deep learning models for face recognition have improved dramatically in recent years.
13. •AP 통신: 로봇이 인간 대신 기사를 작성
•초당 2,000 개의 기사 작성 가능
•기존에 300개 기업의 실적 ➞ 3,000 개 기업을 커버
14. • 1978
• As part of the obscure task of “discovery” —
providing documents relevant to a lawsuit — the
studios examined six million documents at a
cost of more than $2.2 million, much of it to pay
for a platoon of lawyers and paralegals who
worked for months at high hourly rates.
• 2011
• Now, thanks to advances in artificial intelligence,
“e-discovery” software can analyze documents
in a fraction of the time for a fraction of the
cost.
• In January, for example, Blackstone Discovery of
Palo Alto, Calif., helped analyze 1.5 million
documents for less than $100,000.
19. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
21. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
31. 600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
35. •Trained by 400 cases of historical patients cases
•Assessed accuracy OEA treatment suggestions
using MD Anderson’s physicians’ decision as benchmark
•When 200 leukemia cases were tested,
•False positive rate=2.9% (OEA 추천 치료법이 부정확한 경우)
•False negative rate=0.4% (정확한 치료법이 낮은 점수를 받은 경우)
•Overall accuracy of treatment recommendation=82.6%
•Conclusion: Suggested personalized treatment option showed
reasonably high accuracy
MDAnderson’s Oncology ExpertAdvisor Powered by IBM Watson
:AWeb-Based Cognitive Clinical Decision Support Tool
Koichi Takahashi, MD (ASCO 2014)
36. Annals of Oncology (2016) 27 (suppl_9): ix179-ix180. 10.1093/annonc/mdw601
Validation study to assess performance of IBM cognitive
computing system Watson for oncology with Manipal
multidisciplinary tumour board for 1000 consecutive cases:
An Indian experience
• MMDT(Manipal multidisciplinary tumour board) treatment recommendation and
data of 1000 cases of 4 different cancers breast (638), colon (126), rectum (124)
and lung (112) which were treated in last 3 years was collected.
• Of the treatment recommendations given by MMDT, WFO provided
50% in REC, 28% in FC, 17% in NREC
• Nearly 80% of the recommendations were in WFO REC and FC group
• 5% of the treatment provided by MMDT was not available with WFO
• The degree of concordance varied depending on the type of cancer
• WFO-REC was high in Rectum (85%) and least in Lung (17.8%)
• high with TNBC (67.9%); HER2 negative (35%)
• WFO took a median of 40 sec to capture, analyze and give the treatment.
(vs MMDT took the median time of 15 min)
38. 식약처 인공지능
가이드라인 초안
Medtronic과
혈당관리 앱 시연
2011 2012 2013 2014 2015
Jeopardy! 우승
뉴욕 MSK암센터 협력
(Lung cancer)
MD앤더슨 협력
(Leukemia)
MD앤더슨
Pilot 결과 발표
@ASCO
Watson Fund,
WellTok 에 투자
($22m)
The NewYork
Genome Center 협력
(Glioblastoma 분석)
GeneMD,
Watson Mobile Developer
Challenge의 winner 선정
Watson Fund,
Pathway Genomics 투자
Cleveland Clinic 협력
(Cancer Genome Analysis)
한국 IBM
Watson 사업부 신설
Watson Health 출범
Phytel & Explorys 인수
J&J,Apple, Medtronic 협력
Epic & Mayo Clinic 제휴
(EHR data 분석)
동경대 도입
(oncology)
14 Cancer Center 제휴
(Cancer Genome Analysis)
Mayo Clinic 협력
(clinical trail matching)
Watson Fund,
Modernizing Medicine
투자
Academia
Business
Pathway Genomics OME
closed alpha 시작
TurvenHealth
인수
Apple ResearchKit
통한 수면 연구 시작
2017
가천대 길병원
Watson 도입
(oncology)
Medtronic
Sugar.IQ 출시
제약사
Teva와 제휴
인도 Manipal Hospital
Watson 도입
태국 Bumrungrad
International Hospital,
Watson 도입
최윤섭 디지털헬스케어 연구소, 소장
(주)디지털 헬스케어 파트너스, 대표파트너
최윤섭, Ph.D.
yoonsup.choi@gmail.com
IBM Watson in Healthcare
Merge
Healthcare
인수
2016
Under Amour
제휴
부산대학병원
Watson 도입
(oncology/
genomics)
40. •세계의 여러 병원, 의료 서비스들이 Watson 을 이용하고 있음
•Oncology, Genomics, Clinical Trial Matching의 세 가지 부문 (+추가적인 기능들이 있음)
•가천대 길병원도 Watson for Oncology 로 2016년 11월 진료 시작
2016.12 Connected Health Conference,Washington DC
45. • 인공지능으로 인한 인간 의사의 권위 약화
• 환자의 자기 결정권 및 권익 증대
• 의사의 진료 방식 및 교육 방식의 변화 필요
46. • 의사와 Watson의 판단이 다른 경우?
• NCCN 가이드라인과 다른 판단을 주기는 것으로 보임
• 100 여명 중에 5 case.
• 환자의 판단이 합리적이라고 볼 수 있는가?
• Watson의 정확도는 검증되지 않았음
• ‘제 4차 산업혁명’ 등의 buzz word의 영향으로 보임
• 임상 시험이 필요하지 않은가?
• 환자들의 선호는 인공지능의 adoption rate 에 영향
• 병원 도입에 영향을 미치는 요인들
• analytical validity
• clinical validity/utility
• 의사들의 인식/심리적 요인
• 환자들의 인식/심리적 요인
• 규제 환경 (인허가, 수가 등등)
• 결국 환자가 원하면 (그것이 의학적으로 타당한지를
떠나서) 병원 도입은 더욱 늘어날 수 밖에 없음
47. • Watson 의 반응이 생각보다 매우 좋음
• 도입 2개월만에 85명 암 환자 진료
• 기존의 길병원 예측보다는 더 빠른 수치일 듯
• Big5 에서도 길병원으로 전원 문의 증가 한다는 후문
• 교수들이 더 열심히 상의하고 환자 본다고 함
48. • 부산대학병원: Watson의 솔루션 두 가지를 도입
• Watson for Oncology
• Watson for Genomics
51. 12 Olga Russakovsky* et al.
Fig. 4 Random selection of images in ILSVRC detection validation set. The images in the top 4 rows were taken from
ILSVRC2012 single-object localization validation set, and the images in the bottom 4 rows were collected from Flickr using
scene-level queries.
tage of all the positive examples available. The second is images collected from Flickr specifically for the de- http://arxiv.org/pdf/1409.0575.pdf
52. • Main competition
• 객체 분류 (Classification): 그림 속의 객체를 분류
• 객체 위치 (localization): 그림 속 ‘하나’의 객체를 분류하고 위치를 파악
• 객체 인식 (object detection): 그림 속 ‘모든’ 객체를 분류하고 위치 파악
16 Olga Russakovsky* et al.
Fig. 7 Tasks in ILSVRC. The first column shows the ground truth labeling on an example image, and the next three show
three sample outputs with the corresponding evaluation score.
http://arxiv.org/pdf/1409.0575.pdf
53. Performance of winning entries in the ILSVRC2010-2015 competitions
in each of the three tasks
http://image-net.org/challenges/LSVRC/2015/results#loc
Single-object localization
Localizationerror
0
10
20
30
40
50
2011 2012 2013 2014 2015
Object detection
Averageprecision
0.0
17.5
35.0
52.5
70.0
2013 2014 2015
Image classification
Classificationerror
0
10
20
30
2010 2011 2012 2013 2014 2015
58. DeepFace: Closing the Gap to Human-Level
Performance in FaceVerification
Taigman,Y. et al. (2014). DeepFace: Closing the Gap to Human-Level Performance in FaceVerification, CVPR’14.
Figure 2. Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three
locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million
parameters, where more than 95% come from the local and fully connected layers.
very few parameters. These layers merely expand the input
into a set of simple local features.
The subsequent layers (L4, L5 and L6) are instead lo-
cally connected [13, 16], like a convolutional layer they ap-
ply a filter bank, but every location in the feature map learns
a different set of filters. Since different regions of an aligned
image have different local statistics, the spatial stationarity
The goal of training is to maximize the probability of
the correct class (face id). We achieve this by minimiz-
ing the cross-entropy loss for each training sample. If k
is the index of the true label for a given input, the loss is:
L = log pk. The loss is minimized over the parameters
by computing the gradient of L w.r.t. the parameters and
Human: 95% vs. DeepFace in Facebook: 97.35%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
59. FaceNet:A Unified Embedding for Face
Recognition and Clustering
Schroff, F. et al. (2015). FaceNet:A Unified Embedding for Face Recognition and Clustering
Human: 95% vs. FaceNet of Google: 99.63%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
False accept
False reject
s. This shows all pairs of images that were
on LFW. Only eight of the 13 errors shown
he other four are mislabeled in LFW.
on Youtube Faces DB
ge similarity of all pairs of the first one
our face detector detects in each video.
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
lead to truly amazing results. Figure 7 shows one cluster in
a users personal photo collection, generated using agglom-
erative clustering. It is a clear showcase of the incredible
invariance to occlusion, lighting, pose and even age.
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
sification. Our end-to-end training both simplifies the setup
and shows that directly optimizing a loss relevant to the task
at hand improves performance.
Another strength of our model is that it only requires
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
60. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
v
om
Samy Bengio
Google
bengio@google.com
Dumitru Erhan
Google
dumitru@google.com
s a
cts
his
re-
m-
ed
he
de-
nts
A group of people
shopping at an
outdoor market.
!
There are many
vegetables at the
fruit stand.
Vision!
Deep CNN
Language !
Generating!
RNN
Figure 1. NIC, our model, is based end-to-end on a neural net-
work consisting of a vision CNN followed by a language gener-
61. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
Figure 5. A selection of evaluation results, grouped by human rating.
65. Business Area
Medical Image Analysis
VUNOnet and our machine learning technology will help doctors and hospitals manage
medical scans and images intelligently to make diagnosis faster and more accurately.
Original Image Automatic Segmentation EmphysemaNormal ReticularOpacity
Our system finds DILDs at the highest accuracy * DILDs: Diffuse Interstitial Lung Disease
Digital Radiologist
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
67. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
68. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
69. 골연령 골밀도
Eye Disease Diagnosis Bone Age Detection
Bone Density Diagnosis
during Abdominal CT Scanning
v Initial scanning is conducted by non-specialist
general doctors and ophthalmologist only
sees patients screened by these non-experts.
v For check-up centers these are double reading
which cost them twice.
v False positive rate is increased in order to
enhance sensitivity.
v The assessment process is done by manually
referencing to guide book.
v Re-confirmation is conducted by pediatrics
endocrinology after the first reading of
radiologists.
v Frequent misassessments even for
experienced radiologists.
v When abdominal CT is taken, the bone
information including spine status can be
also extracted.
v Radiologists only sees organs during
abdominal CT reading which waste chance
of detecting bone-related disease.
Medical Image Analysis
using Deep learning
from VUNO, Inc
71. 당뇨성 망막병증
• 당뇨병의 대표적 합병증: 당뇨병력이 30년 이상 환자 90% 발병
• 안과 전문의들이 안저(안구의 안쪽)를 사진으로 찍어서 판독
• 망막 내 미세혈관 생성, 출혈, 삼출물 정도를 파악하여 진단
72. Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
73. Training Set / Test Set
• CNN으로 후향적으로 128,175개의 안저 이미지 학습
• 미국의 안과전문의 54명이 3-7회 판독한 데이터
• 우수한 안과전문의들 7-8명의 판독 결과와 인공지능의 판독 결과 비교
• EyePACS-1 (9,963 개), Messidor-2 (1,748 개)a) Fullscreen mode
b) Hit reset to reload this image. This will reset all of the grading.
c) Comment box for other pathologies you see
eFigure 2. Screenshot of the Second Screen of the Grading Tool, Which Asks Graders to Assess the
Image for DR, DME and Other Notable Conditions or Findings
74. • EyePACS-1 과 Messidor-2 의 AUC = 0.991, 0.990
• 7-8명의 안과 전문의와 sensitivity, specificity 가 동일한 수준
• F-score: 0.95 (vs. 인간 의사는 0.91)
Additional sensitivity analyses were conducted for sev-
eralsubcategories:(1)detectingmoderateorworsediabeticreti-
effects of data set size on algorithm performance were exam-
ined and shown to plateau at around 60 000 images (or ap-
Figure 2. Validation Set Performance for Referable Diabetic Retinopathy
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
EyePACS-1: AUC, 99.1%; 95% CI, 98.8%-99.3%A
100
High-sensitivity operating point
High-specificity operating point
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
Messidor-2: AUC, 99.0%; 95% CI, 98.6%-99.5%B
100
High-specificity operating point
High-sensitivity operating point
Performance of the algorithm (black curve) and ophthalmologists (colored
circles) for the presence of referable diabetic retinopathy (moderate or worse
diabetic retinopathy or referable diabetic macular edema) on A, EyePACS-1
(8788 fully gradable images) and B, Messidor-2 (1745 fully gradable images).
The black diamonds on the graph correspond to the sensitivity and specificity of
the algorithm at the high-sensitivity and high-specificity operating points.
In A, for the high-sensitivity operating point, specificity was 93.4% (95% CI,
92.8%-94.0%) and sensitivity was 97.5% (95% CI, 95.8%-98.7%); for the
high-specificity operating point, specificity was 98.1% (95% CI, 97.8%-98.5%)
and sensitivity was 90.3% (95% CI, 87.5%-92.7%). In B, for the high-sensitivity
operating point, specificity was 93.9% (95% CI, 92.4%-95.3%) and sensitivity
was 96.1% (95% CI, 92.4%-98.3%); for the high-specificity operating point,
specificity was 98.5% (95% CI, 97.7%-99.1%) and sensitivity was 87.0% (95%
CI, 81.1%-91.0%). There were 8 ophthalmologists who graded EyePACS-1 and 7
ophthalmologists who graded Messidor-2. AUC indicates area under the
receiver operating characteristic curve.
Research Original Investigation Accuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
Results
77. LETTERH
his task, the CNN achieves 72.1±0.9% (mean±s.d.) overall
he average of individual inference class accuracies) and two
gists attain 65.56% and 66.0% accuracy on a subset of the
set. Second, we validate the algorithm using a nine-class
rtition—the second-level nodes—so that the diseases of
have similar medical treatment plans. The CNN achieves
two trials, one using standard images and the other using
images, which reflect the two steps that a dermatologist m
to obtain a clinical impression. The same CNN is used for a
Figure 2b shows a few example images, demonstrating th
distinguishing between malignant and benign lesions, whic
visual features. Our comparison metrics are sensitivity an
Acral-lentiginous melanoma
Amelanotic melanoma
Lentigo melanoma
…
Blue nevus
Halo nevus
Mongolian spot
…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
Convolution
AvgPool
MaxPool
Concat
Dropout
Fully connected
Softmax
Deep CNN layout. Our classification technique is a
Data flow is from left to right: an image of a skin lesion
e, melanoma) is sequentially warped into a probability
over clinical classes of skin disease using Google Inception
hitecture pretrained on the ImageNet dataset (1.28 million
1,000 generic object classes) and fine-tuned on our own
29,450 skin lesions comprising 2,032 different diseases.
ning classes are defined using a novel taxonomy of skin disease
oning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melano
melanoma). Inference classes are more general and are comp
or more training classes (for example, malignant melanocytic
class of melanomas). The probability of an inference class is c
summing the probabilities of the training classes according to
structure (see Methods). Inception v3 CNN architecture repr
from https://research.googleblog.com/2016/03/train-your-ow
classifier-with.html
GoogleNet Inception v3
• 129,450개의 피부과 병변 이미지 데이터를 자체 제작
• 미국의 피부과 전문의 18명이 데이터 curation
• CNN (Inception v3)으로 이미지를 학습
• 피부과 전문의들 21명과 인공지능의 판독 결과 비교
• 표피세포 암 (keratinocyte carcinoma)과 지루각화증(benign seborrheic keratosis)의 구분
• 악성 흑색종과 양성 병변 구분 (표준 이미지 데이터 기반)
• 악성 흑색종과 양성 병변 구분 (더마토스코프로 찍은 이미지 기반)
78. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
21명 중에 인공지능보다 정확성이 떨어지는 피부과 전문의들이 상당수 있었음
피부과 전문의들의 평균 성적도 인공지능보다 좋지 않았음
79. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
81. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
The overall agreement between the individual pathologists’
interpretations and the expert consensus–derived reference
diagnoses was 75.3% (total 240 cases)
82. Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Building an epithelial/stromal classifier:
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
B
Basic image processing and feature construction:
H&E image Image broken into superpixels Nuclei identified within
each superpixel
A
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients Processed images from patients
C
D
onNovember17,2011stm.sciencemag.orgwnloadedfrom
TMAs contain 0.6-mm-diameter cores (median
of two cores per case) that represent only a small
sample of the full tumor. We acquired data from
two separate and independent cohorts: Nether-
lands Cancer Institute (NKI; 248 patients) and
Vancouver General Hospital (VGH; 328 patients).
Unlike previous work in cancer morphom-
etry (18–21), our image analysis pipeline was
not limited to a predefined set of morphometric
features selected by pathologists. Rather, C-Path
measures an extensive, quantitative feature set
from the breast cancer epithelium and the stro-
ma (Fig. 1). Our image processing system first
performed an automated, hierarchical scene seg-
mentation that generated thousands of measure-
ments, including both standard morphometric
descriptors of image objects and higher-level
contextual, relational, and global image features.
The pipeline consisted of three stages (Fig. 1, A
to C, and tables S8 and S9). First, we used a set of
processing steps to separate the tissue from the
background, partition the image into small regions
of coherent appearance known as superpixels,
find nuclei within the superpixels, and construct
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients
alive at 5 years
Processed images from patients
deceased at 5 years
L1-regularized
logisticregression
modelbuilding
5YS predictive model
Unlabeled images
Time
P(survival)
C
D
Identification of novel prognostically
important morphologic features
basic cellular morphologic properties (epithelial reg-
ular nuclei = red; epithelial atypical nuclei = pale blue;
epithelial cytoplasm = purple; stromal matrix = green;
stromal round nuclei = dark green; stromal spindled
nuclei = teal blue; unclassified regions = dark gray;
spindled nuclei in unclassified regions = yellow; round
nuclei in unclassified regions = gray; background =
white). (Left panel) After the classification of each
image object, a rich feature set is constructed. (D)
Learning an image-based model to predict survival.
Processed images from patients alive at 5 years after
surgery and from patients deceased at 5 years after
surgery were used to construct an image-based prog-
nostic model. After construction of the model, it was
applied to a test set of breast cancer images (not
used in model building) to classify patients as high
or low risk of death by 5 years.
www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
83. Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Top stromal features associated with survival.
primarily characterizing epithelial nuclear characteristics, such as
size, color, and texture (21, 36). In contrast, after initial filtering of im-
ages to ensure high-quality TMA images and training of the C-Path
models using expert-derived image annotations (epithelium and
stroma labels to build the epithelial-stromal classifier and survival
time and survival status to build the prognostic model), our image
analysis system is automated with no manual steps, which greatly in-
creases its scalability. Additionally, in contrast to previous approaches,
our system measures thousands of morphologic descriptors of diverse
identification of prognostic features whose significance was not pre-
viously recognized.
Using our system, we built an image-based prognostic model on
the NKI data set and showed that in this patient cohort the model
was a strong predictor of survival and provided significant additional
prognostic information to clinical, molecular, and pathological prog-
nostic factors in a multivariate model. We also demonstrated that the
image-based prognostic model, built using the NKI data set, is a strong
prognostic factor on another, independent data set with very different
SD of the ratio of the pixel intensity SD to the mean intensity
for pixels within a ring of the center of epithelial nuclei
A
The sum of the number of unclassified objects
SD of the maximum blue pixel value for atypical epithelial nuclei
Maximum distance between atypical epithelial nuclei
B
C
D
Maximum value of the minimum green pixel intensity value in
epithelial contiguous regions
Minimum elliptic fit of epithelial contiguous regions
SD of distance between epithelial cytoplasmic and nuclear objects
Average border between epithelial cytoplasmic objects
E
F
G
H
Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap anal-
ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD
of the (SD of intensity/mean intensity) for pixels within a ring of the center
of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low
score); right, great nuclear intensity diversity (high score). (B) Sum of the
number of unclassified objects. Red, epithelial regions; green, stromal re-
gions; no overlaid color, unclassified region. Left, few unclassified objects
(low score); right, higher number of unclassified objects (high score). (C) SD
of the maximum blue pixel value for atypical epithelial nuclei. Left, high
score; right, low score. (D) Maximum distance between atypical epithe-
lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial
nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial
contiguous regions. Left, high score; right, low score. (F) SD of distance
between epithelial cytoplasmic and nuclear objects. Left, high score; right,
low score. (G) Average border between epithelial cytoplasmic objects. Left,
high score; right, low score. (H) Maximum value of the minimum green
pixel intensity value in epithelial contiguous regions. Left, low score indi-
cating black pixels within epithelial region; right, higher score indicating
presence of epithelial regions lacking black pixels.
onNovember17,2011stm.sciencemag.orgDownloadedfrom
and stromal matrix throughout the image, with thin cords of epithe-
lial cells infiltrating through stroma across the image, so that each
stromal matrix region borders a relatively constant proportion of ep-
ithelial and stromal regions. The stromal feature with the second
largest coefficient (Fig. 4B) was the sum of the minimum green in-
tensity value of stromal-contiguous regions. This feature received a
value of zero when stromal regions contained dark pixels (such as
inflammatory nuclei). The feature received a positive value when
stromal objects were devoid of dark pixels. This feature provided in-
formation about the relationship between stromal cellular composi-
tion and prognosis and suggested that the presence of inflammatory
cells in the stroma is associated with poor prognosis, a finding con-
sistent with previous observations (32). The third most significant
stromal feature (Fig. 4C) was a measure of the relative border between
spindled stromal nuclei to round stromal nuclei, with an increased rel-
ative border of spindled stromal nuclei to round stromal nuclei asso-
ciated with worse overall survival. Although the biological underpinning
of this morphologic feature is currently not known, this analysis sug-
gested that spatial relationships between different populations of stro-
mal cell types are associated with breast cancer progression.
Reproducibility of C-Path 5YS model predictions on
samples with multiple TMA cores
For the C-Path 5YS model (which was trained on the full NKI data
set), we assessed the intrapatient agreement of model predictions when
predictions were made separately on each image contributed by pa-
tients in the VGH data set. For the 190 VGH patients who contributed
two images with complete image data, the binary predictions (high
or low risk) on the individual images agreed with each other for 69%
(131 of 190) of the cases and agreed with the prediction on the aver-
aged data for 84% (319 of 380) of the images. Using the continuous
prediction score (which ranged from 0 to 100), the median of the ab-
solute difference in prediction score among the patients with replicate
images was 5%, and the Spearman correlation among replicates was
0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is
only moderate, and these findings suggest significant intrapatient tumor
heterogeneity, which is a cardinal feature of breast carcinomas (33–35).
Qualitative visual inspection of images receiving discordant scores
suggested that intrapatient variability in both the epithelial and the
stromal components is likely to contribute to discordant scores for
the individual images. These differences appeared to relate both to
the proportions of the epithelium and stroma and to the appearance
of the epithelium and stroma. Last, we sought to analyze whether sur-
vival predictions were more accurate on the VGH cases that contributed
multiple cores compared to the cases that contributed only a single
core. This analysis showed that the C-Path 5YS model showed signif-
icantly improved prognostic prediction accuracy on the VGH cases
for which we had multiple images compared to the cases that con-
tributed only a single image (Fig. 7). Together, these findings show
a significant degree of intrapatient variability and indicate that increased
tumor sampling is associated with improved model performance.
DISCUSSION
Heat map of stromal matrix
objects mean abs.diff
to neighbors
H&E image separated
into epithelial and
stromal objects
A
B
C
Worse
prognosis
Improved
prognosis
Improved
prognosis
Improved
prognosis
Worse
prognosis
Worse
prognosis
Fig. 4. Top stromal features associated with survival. (A) Variability in ab-
solute difference in intensity between stromal matrix regions and neigh-
bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets)
Top panel, high score; bottom panel; low score. Right panels, stromal matrix
objects colored blue (low), green (medium), or white (high) according to
each object’s absolute difference in intensity to neighbors. (B) Presence
R E S E A R C H A R T I C L E
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Top epithelial features.The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap
anal- ysis. Left panels, improved prognosis; right panels, worse prognosis.
84. Train
Test
whole slide image
sample
sample
training data
normaltumor
deep model
P(tumor)
whole slide image
overlapping image
patches tumor prob. map
1.0
0.0
0.5
Figure 2: The framework of cancer metastases detection.
extract millions of small positive and negative patches from
the set of training WSIs. If the small patch is located in
a tumor region, it is a tumor / positive patch and labeled
more than 6 million parameters.
Table 2: Evaluation of Various Deep Models
Deep Learning for Identifying Metastatic Breast Cancer
International Symposium on Biomedical Imaging 2016
85. Deep Learning for Identifying Metastatic Breast Cancer
International Symposium on Biomedical Imaging 2016
Figure 4: Receiver Operating Characteristic (ROC) curve of
Slide-based Classification
sensitivity versus the average number of false-positives per
image. Our submitted result was generated based on the al-
petition. For the slide
pathologist achieved a
cent error rate. When
system were combine
pathologist, the AUC
in the error rate to 0.5
5. Discussion
Here we present a
automated detection o
images of sentinel lym
tem include: enrichm
from regions of norm
initially mis-classifyi
art deep learning mod
post-processing meth
and lesion-based detec
Historically, approa
ysis in digital patholo
level image analysis
clear segmentation, a
• AUC of deep learning = 0.925
• AUC of pathologists = 0.966
• AUC of deep learning + pathologist = 0.995
91. S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
R E S E A R C H A R T I C L E
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
92. puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17)
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85
and identified patients a median of 28.2 hours before onset.
95. In an early research project involving 600 patient cases, the team was able to
predict near-term hypoglycemic events up to 3 hours in advance of the symptoms.
IBM Watson-Medtronic
Jan 7, 2016
96. Sugar.IQ
사용자의 음식 섭취와 그에 따른 혈당
변화, 인슐린 주입 등의 과거 기록 기반
식후 사용자의 혈당이 어떻게 변화할지
Watson 이 예측
98. Prediction ofVentricular Arrhythmia
Collaboration with Prof. Segyeong Joo (Asan Medical Center)
Analysed “Physionet Spontaneous Ventricular Tachyarrhythmia Database” for 2.5 months (on going project)
Joo S, Choi KJ, Huh SJ, 2012, Expert Systems with Applications (Vol 39, Issue 3)
▪ Recurrent Neural Network with Only Frequency Domain Transform
• Input : Spectrogram with 129 features obtained after ectopic beats removal
• Stack of LSTM Networks
• Binary cross-entropy loss
• Trained with RMSprop
• Prediction Accuracy : 76.6% ➞ 89.6%
Dropout
Dropout
99. Prediction ofVentricular
TachycardiaOne Hour before
Occurrence UsingArtificial
Neural Networks
Hyojeong Lee1,*
, Soo-Yong Shin2,*
, Myeongsook Seo3
,Gi-Byoung Nam3
& Segyeong Joo1,4
Ventricular tachycardia (VT) is a potentially fatal tachyarrhythmia, which causes a rapid heartbeat as
a result of improper electrical activity of the heart.This is a potentially life-threatening arrhythmia
because it can cause low blood pressure and may lead to ventricular fibrillation, asystole, and sudden
cardiac death.To preventVT, we developed an early prediction model that can predict this event one
hour before its onset using an artificial neural network (ANN) generated using 14 parameters obtained
from heart rate variability (HRV) and respiratory rate variability (RRV) analysis. De-identified raw
data from the monitors of patients admitted to the cardiovascular intensive care unit atAsan Medical
Center between September 2013 andApril 2015 were collected.The dataset consisted of 52 recordings
obtained one hour prior toVT events and 52 control recordings.Two-thirds of the extracted parameters
were used to train theANN, and the remaining third was used to evaluate performance of the learned
ANN.The developedVT prediction model proved its performance by achieving a sensitivity of 0.88,
specificity of 0.82, andAUC of 0.93.
Sudden cardiac death (SCD) causes more than 300,000 deaths annually in the United States1
. Coronary artery
disease, cardiomyopathy, structural heart problems, Brugada syndrome, and long QT syndrome are well known
causes of SCD1–4
. In addition, spontaneous ventricular tachyarrhythmia (VTA) is a main cause of SCD, contrib-
uting to about 80% of SCDs5
. Ventricular tachycardia (VT) and ventricular fibrillation (VF) comprise VTA. VT
is defined as a very rapid heartbeat (more than 100 times per minute), which does not allow enough time for the
ventricles to fill with blood between beats. VT may terminate spontaneously after a few seconds; however, in some
cases, VT can progress to more dangerous or fatal arrhythmia, VF. Accordingly, early prediction of VT will help
in reducing mortality from SCD by allowing for preventive care of VTA.
Several studies have reported attempts at predicting VTAs by assessing the occurrence of syncope, left ventricu-
lar systolic dysfunction, QRS (Q, R, and S wave in electrocardiogram) duration, QT (Q and T wave) dispersion,
Holter monitoring, signal averaged electrocardiograms (ECGs), heart rate variability (HRV), T wave alternans,
electrophysiologic testing, B-type natriuretic peptides, and other parameters or method6–10
. Among these studies,
prediction of VTAs based on HRV analysis has recently emerged and shown potential for predicting VTA11–13
.
Previous studies have focused on the prediction of VT using HRV analysis. In addition, most studies assessed
the statistical value of each parameter calculated on or prior to the VT event and parameters of control data,
which were collected from Holter recordings and implantable cardioverter defibrillators (ICDs)12,14,15
. However,
the results were not satisfactory in predicting fatal events like VT.
To make a better prediction model of VT, it is essential to utilize multiple parameters from various methods
of HRV analysis and to generate a classifier that can deal with complex patterns composed of such parameters7
.
Artificial neural network (ANN) is a valuable tool for classification of a database with multiple parameters. ANN
is a kind of machine learning algorithms, which can be trained using data with multiple parameters16
. After
training, the ANN calculates an output value according to the input parameters, and this output value can be used
1
Department of Biomedical Engineering, University of Ulsan College of Medicine, Seoul, Republic of Korea.
2
Department of Biomedical Informatics, Asan Medical Center, Seoul, Republic of Korea. 3
Department of Internal
Re e e : 26 pr 2016
A ep e : 03 s 2016
P s e : 26 s 2016
OPEN
Lee H. et al, Scientific Report, 2016
100. Prediction of Ventricular Tachycardia One Hour before
Occurrence Using Artificial Neural Networks
ww.nature.com/scientificreports/
in pattern recognition or classification. ANN has not been widely used in medical analysis since the algorithm
is not intuitive for physicians. However, utilization of ANN in medical research has recently emerged17–19
. Our
Parameters
Control dataset (n=110) VTs dataset (n=110)
Mean±SD Mean±SD p-Value
Mean NN (ms) 0.709±0.149 0.718±0.158 0.304
SDNN (ms) 0.061±0.042 0.073±0.045 0.013
RMSSD (ms) 0.068±0.053 0.081±0.057 0.031
pNN50 (%) 0.209±0.224 0.239±0.205 0.067
VLF (ms2
) 4.1E-05±6.54E-05 6.23E-05±9.81E-05 0.057
LF (ms2
) 7.61E-04±1.16E-03 1.04E-03±1.15E-03 0.084
HF (ms2
) 1.53E-03±2.02E-03 1.96E-03±2.16E-03 0.088
LF/HF 0.498±0.372 0.533±0.435 0.315
SD1 (ms) 0.039±0.029 0.047±0.032 0.031
SD2 (ms) 0.081±0.057 0.098±0.06 0.012
SD1/SD2 0.466±0.169 0.469±0.164 0.426
RPdM (ms) 2.73±0.817 2.95±0.871 0.038
RPdSD (ms) 0.721±0.578 0.915±0.868 0.075
RPdV 28.4±5.31 25.4±3.56 <0.002
Table 1. Comparison of HRV and RRV parameters between the control and VT dataset.
ANN with Input Sensitivity (%) Specificity (%) Accuracy (%) PPV (%) NPV (%) AUC
HRV parameters 11 70.6(12/17) 76.5(13/17) 73.5(25/34) 75.0(12/16) 72.2(13/18) 0.75
RRV parameters 3 82.4(14/17) 82.4(14/17) 82.4(28/34) 82.4(14/17) 82.4(14/17) 0.83
HRV+RRV parameters 14 88.2(15/17) 82.4(14/17) 85.3(29/34) 83.3(15/18) 87.5(14/16) 0.93
Table 2. Performance of three ANNs in predicting a VT event 1hour before onset for the test dataset.
Lee H. et al, Scientific Report, 2016
This ANN with 13 hidden
neurons in one hidden
layer showed the best
performance.
101. www.nature.com/scientificreports/
Discussion
Figure 1. ROC curve of three ANNs (dashed line, with only HRV parameters; dashdot line, with
parameters; solid line, with HRV and RRV parameters; dotted line, reference) used in the predict
VT event one hour before onset.
ROC curve of three ANNs (dashed line, with only HRV parameters; dashdot line, with
only RRV parameters; solid line, with HRV and RRV parameters; dotted line, reference)
used in the prediction of aVT event one hour before onset.
Prediction of Ventricular Tachycardia One Hour before
Occurrence Using Artificial Neural Networks
Lee H. et al, Scientific Report, 2016
102. •아주대병원 외상센터, 응급실, 내과계 중환자실 등 3곳의 80개 병상
•산소포화도, 혈압, 맥박, 뇌파, 체온 등 8가지 환자 생체 데이터를 하나로 통합 저장
•생체 정보를 인공지능으로 실시간 모니터링+분석하여 1-3시간 전에 예측
•부정맥, 패혈증, 급성호흡곤란증후군(ARDS), 계획되지 않은 기도삽관 등의 질병
104. •인공지능의 의료 활용
•복잡한 데이터의 분석 및 권고안 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예측
•새로운 이슈
• 의사의 대체 가능 여부
• 결과의 책임 소재
• 근거 창출의 필요성과 어려움
인공지능은 미래의 의료를 어떻게 혁신할 것인가
110. •인간 의사와 인공지능 의사의 실력을 비교할 수 있을까?
•기술적 이슈
•Retrospective 하게 정확도를 검증해볼 수는 있을 것
•하지만 prospective 하게 실제 환자군에 대해서,
•비교 우위, 비열등성을 보기 위해서
•Double blinded, randomised, controlled trial 을 할 수 있을까?
•윤리적 이슈
111. 기계적인 일을 모두 기계가 대신한다면,
과연 인간의 역할은 무엇일까?
그 전에, 현재 의사의 역할에는 어떤 것들이 있을까?
113. •J&J이 수면 유도 마취로봇인 ‘세더시스(Sedasys)' 를 2014년 출시
•결장경, 내시경 검사 때 프로포폴을 주사해 환자 수면을 유도하는 마취용 의료로봇
•혈중 산소 함량, 심장박동 수 등 환자 신체 징후에 따라 투약량을 조절
•2013년 FDA가 승인하면서 미국, 호주, 캐나다 등 병원에 2014년부터 보급
•수면내시경 의료비를 1/10 로 낮춤 (2000달러 vs 150-200달러)
•마취전문의협회 등은 대대적인 반대 캠페인을 벌이고 정치권에 규제 로비를 전개
•월스트리트 저널: “J&J가 수입원이 줄어들 위기에 처한 마취전문의들과 싸움에서 패한 것"
115. 기계적인 일을 모두 기계가 대신한다면,
과연 인간의 역할은 무엇일까?
현재 의사의 역할에는 어떤 것들이 있을까?
117. •사라질 역할
•기계적인 역할: 기계가 더 쉽고 정확하게 할 수 있는 일
•근거 및 논리에 의한 판단
•순서도로 도식화할 수 있는 것
•‘왜 그런 결정을 내렸는지 논리적으로 설명할 수 있는가?’
•‘다른 의사들에게 가도 비슷한 결정을 내릴 것인가?’
•‘내가 한 달 뒤에 보더라도 같은 결정을 내릴까?’
•의료 데이터 모니터링 및 해석, 판독
121. •유지/강조될 역할
•마지막 의료적 의사 결정
•인간만이 할 수 있는 인간적인 일
•Human touch
•커뮤니케이션, 공감, care …
•환자를 진료/치료하는 이외의 역할
•기초 연구
•새로운 데이터와 기준을 만들어내는 일 ➞ 기계에 반영
122. Over the course of a career, an oncologist may impart bad news an average of 20,000 times,
but most practicing oncologists have never received any formal training to help them
prepare for such conversations.
123. High levels of empathy in primary care physicians correlate with
better clinical outcomes for their patients with diabetes
124. •새로운 역할
•임상에 인공지능을 활용하는 방법에 대한 트레이닝
•구체적으로 어떻게 활용할지에 대한 연구 및 가이드라인 필요
•clinical outcome
•quality of care
•cost effectiveness
•이러한 역할에 맞게 의학 교육도 바뀌어야 할 것
128. 1940년대 1950년대 1960년대 1980년대
•조종사1
•조종사2
•항공기관사
•항공사
•무선통신사
•조종사1
•조종사2
•항공기관사
•항공사
•조종사1
•조종사2
•항공기관사
•조종사1
•조종사2
129. “조종사가 없는 비행기의 시대가 열릴 것이다.
그건 단지 시간 문제일 뿐이다.”
James Albaugh, Boeing, 2011
134. 조종사들의 탈숙련화
(diskilling of the Crew)
자동화에 대한 지나친 의존이
조종사의 전문지식과 반사신경이 감퇴, 수동 비행 기술이 퇴화
•66명의 베테랑 조종사로 실험
•엔진이 폭발한 보잉737기를 조종
•수동 조종으로 착륙시키는 시뮬레이션
대부분 간신히 통과
실험 직전 두 달동안의
수동 비행 시간의 양과 조종능력이 상관관계
137. •가장 민감한 부분이며, 실제 적용에 가장 큰 걸림돌
•여러 변수가 있기 때문에 간단한 문제가 아니다.
•Bottom Line: 최종 의사결정은 인간 의사가 내린다.
•일단 현재는 책임 소재는 누구에게?
•진단 및 의학적 결정의 책임은 누가 지는가
•현실적으로 의학적 결정은 의사만 내리는 것인가
결과에 대한 책임은 누가 지는가
138. •인공지능의 형식과 활용 방법에 따라서 달라질 수 있다.
•결과 양식: 등수 / 점수 / 신호등 (상/중/하)
•근거/과정의 투명성: 근거의 유무 / 판단 과정 투명 or 블랙박스
•인간 의사의 개입 시점
•pre-screening: AI, then human doctor
•double reading: AI + human doctor
•double check (second opinion): human doctor, then AI
141. •인공지능의 형식과 활용 방법에 따라서 달라질 수 있다.
•결과 양식: 등수 / 점수 / 신호등 (상/중/하)
•근거/과정의 투명성: 근거의 유무 / 판단 과정 투명 or 블랙박스
•인간 의사의 개입 시점
•pre-screening: AI, then human doctor
•double reading: AI + human doctor
•double check (second opinion): human doctor, then AI
144. THEBLACKBOX
2 0 | N A T U R E | V O L 5 3 8 | 6 O C T O B E R 2 0 1 6
THEBLACKBOX OFAI
145. •인공지능의 형식과 활용 방법에 따라서 달라질 수 있다.
•결과 양식: 등수 / 점수 / 신호등 (상/중/하)
•근거/과정의 투명성: 근거의 유무 / 판단 과정 투명 or 블랙박스
•인간 의사의 개입 시점
•pre-screening: AI ➞ then human doctor
•double reading: AI + human doctor
•double check: human doctor ➞ AI
147. 아직은 근거가 부족하다
• Analytical validity
• Clinical validity
• Clinical utility
+
• Cost-effectiveness
• Efficiency of clinical practice
결과 형식
근거 유무/과정
활용 방식
148. Medtronic과
혈당관리 앱 시연
2011 2012 2013 2014 2015
Jeopardy! 우승
뉴욕 MSK암센터 협력
(Lung cancer)
MD앤더슨 협력
(Leukemia)
MD앤더슨
Pilot 결과 발표
@ASCO
Watson Fund,
WellTok 에 투자
($22m)
The NewYork
Genome Center 협력
(Glioblastoma 분석)
GeneMD,
Watson Mobile Developer
Challenge의 winner 선정
Watson Fund,
Pathway Genomics 투자
Cleveland Clinic 협력
(Cancer Genome Analysis)
한국 IBM
Watson 사업부 신설
Watson Health 출범
Phytel & Explorys 인수
J&J,Apple, Medtronic 협력
Epic & Mayo Clinic 제휴
(EHR data 분석)
동경대 도입
(oncology)
14 Cancer Center 제휴
(Cancer Genome Analysis)
Mayo Clinic 협력
(clinical trail matching)
Watson Fund,
Modernizing Medicine
투자
태국 Bumrungrad
International Hospital,
Watson 도입
2016
Pathway Genomics OME
closed alpha 시작
Merge Healthcare 인수
(영상의료데이터)
TurvenHealth
인수
Apple ResearchKit
통한 수면 연구 시작
인도 Maniple
Hospital 도입
(oncology)
인공지능의 대명사 Watson의 경우에도 아직 충분한 근거를 보여준 바 없다.
정확성 / 의학적 효용 / 진료 효율성 / 비용 절감
149. Q: Watson이 MSKCC에 들어간지 이제 5년째지만,
아직 Watson 의 정확성이나 효과에 대해서는 보여준 데이터나 근거가 별로 없다. 왜 그런가?
A: 아직까지 효과성을 검증하기 위한 기간이 충분하지 않았던 것 같다.
Q: 그 근거를 혹시 언제쯤 볼 수 있는지 아는가?
A: 확실하지 않다. 우리도 그러한 근거가 나오기를 기다리고 있다.
150. •Watson Oncology 의 임상 시험 디자인을 한다면,
• Primary / secondary outcome 을 무엇으로 해야할까
• Cost-effectiveness 를 어떻게 증명할까
• 개별 병원에 특화된 시스템: 연구의 범용성 이슈
155. •인공지능의 의료 활용
•복잡한 데이터의 분석 및 권고안 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예측
•새로운 이슈
• 의사의 대체 가능 여부
• 결과의 책임 소재
• 근거 창출의 필요성과 어려움
인공지능은 미래의 의료를 어떻게 혁신할 것인가