by Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
Professor, SAHIST, Sungkyunkwan University
Director, Digital Healthcare Institute
Managing Partner, Digital Healthcare Partners
Investment in The Coconut Industry by Nancy Cheruiyot
How to implement digital medicine in the future
1. How to Implement Digital Healthcare in the Future
Professor, SAHIST, Sungkyunkwan University
Director, Digital Healthcare Institute
Yoon Sup Choi, Ph.D.
2. “It's in Apple's DNA that technology alone is not enough.
It's technology married with liberal arts.”
13. 5%
8%
24%
27%
36%
Life Science & Health
Mobile
Enterprise & Data
Consumer
Commerce
9%
13%
23%
24%
31%
Life Science & Health
Consumer
Enterprise
Data & AI
Others
2014 2015
Investment of GoogleVentures in 2014-2015
16. 헬스케어넓은 의미의 건강 관리에는 해당되지만,
디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것
예) 운동, 영양, 수면
디지털 헬스케어
건강 관리 중에 디지털 기술이 사용되는 것
예) 사물인터넷, 인공지능, 3D 프린터, VR/AR
모바일 헬스케어
디지털 헬스케어 중
모바일 기술이 사용되는 것
예) 스마트폰, 사물인터넷, SNS
개인 유전정보분석
예) 암유전체, 질병위험도,
보인자, 약물 민감도
예) 웰니스, 조상 분석
헬스케어 관련 분야 구성도(ver 0.3)
의료
질병 예방, 치료, 처방, 관리
등 전문 의료 영역
원격의료
원격진료
17. What is most important factor in digital medicine?
18. “Data! Data! Data!” he cried.“I can’t
make bricks without clay!”
- Sherlock Holmes,“The Adventure of the Copper Beeches”
19.
20. 새로운 데이터가
새로운 방식으로
새로운 주체에 의해
측정, 저장, 통합, 분석된다.
데이터의 종류
데이터의 질적/양적 측면
웨어러블 기기
스마트폰
유전 정보 분석
인공지능
SNS
사용자/환자
대중
21. Three Steps to Implement Digital Medicine
• Step 1. Measure the Data
• Step 2. Collect the Data
• Step 3. Insight from the Data
22. Digital Healthcare Industry Landscape
Data Measurement Data Integration Data Interpretation Treatment
Smartphone Gadget/Apps
DNA
Artificial Intelligence
2nd Opinion
Wearables / IoT
(ver. 3)
EMR/EHR 3D Printer
Counseling
Data Platform
Accelerator/early-VC
Telemedicine
Device
On Demand (O2O)
VR
Digital Healthcare Institute
Diretor, Yoon Sup Choi, Ph.D.
yoonsup.choi@gmail.com
23. Data Measurement Data Integration Data Interpretation Treatment
Smartphone Gadget/Apps
DNA
Artificial Intelligence
2nd Opinion
Device
On Demand (O2O)
Wearables / IoT
Digital Healthcare Institute
Diretor, Yoon Sup Choi, Ph.D.
yoonsup.choi@gmail.com
EMR/EHR 3D Printer
Counseling
Data Platform
Accelerator/early-VC
VR
Telemedicine
Digital Healthcare Industry Landscape (ver. 3)
48. Beyond Verbal
• 기계가 사람의 감정을 이해한다면?
• 헬스케어 분야에서도 응용도 높음: 슬픔/우울함/피로 등의 감정 파악
• 일부 보험 회사에서는 가입자의 우울증 여부 파악을 위해 이미 사용 중
• Aetna 는 2012년 부터 고객의 우울증 여부를 전화 목소리 분석으로 파악
• 기존의 방식에 비해 우울증 환자 6배 파악
• 사생활 침해 여부 존재
51. Digital Phenotype:
Your smartphone knows if you are depressed
J Med Internet Res. 2015 Jul 15;17(7):e175.
The correlation analysis between the features and the PHQ-9 scores revealed that 6 of the 10
features were significantly correlated to the scores:
• strong correlation: circadian movement, normalized entropy, location variance
• correlation: phone usage features, usage duration and usage frequency
52. Digital Phenotype:
Your smartphone knows if you are depressed
J Med Internet Res. 2015 Jul 15;17(7):e175.
Comparison of location and usage feature statistics between participants with no symptoms of depression (blue) and the
ones with (red). (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay;TT, transition time;TD,
total distance; CM, circadian movement; NC, number of clusters; UF, usage frequency; UD, usage duration).
Figure 4. Comparison of location and usage feature statistics between participants with no symptoms of depression (blue) and the ones with (red).
Feature values are scaled between 0 and 1 for easier comparison. Boxes extend between 25th and 75th percentiles, and whiskers show the range.
Horizontal solid lines inside the boxes are medians. One, two, and three asterisks show significant differences at P<.05, P<.01, and P<.001 levels,
respectively (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay; TT, transition time; TD, total distance; CM, circadian
movement; NC, number of clusters; UF, usage frequency; UD, usage duration).
Figure 5. Coefficients of correlation between location features. One, two, and three asterisks indicate significant correlation levels at P<.05, P<.01,
and P<.001, respectively (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay; TT, transition time; TD, total distance;
CM, circadian movement; NC, number of clusters).
Saeb et alJOURNAL OF MEDICAL INTERNET RESEARCH
the variability of the time
the participant spent at
the location clusters
what extent the participants’
sequence of locations followed a
circadian rhythm.
home stay
54. Digital Phenotype:
Your Instagram knows if you are depressed
Rao (MVR) (24) .
Results
Both Alldata and Prediagnosis models were decisively superior to a null model
. Alldata predictors were significant with 99% probability.57.5;(KAll = 1 K 49.8) Pre = 1 7
Prediagnosis and Alldata confidence levels were largely identical, with two exceptions:
Prediagnosis Brightness decreased to 90% confidence, and Prediagnosis posting frequency
dropped to 30% confidence, suggesting a null predictive value in the latter case.
Increased hue, along with decreased brightness and saturation, predicted depression. This
means that photos posted by depressed individuals tended to be bluer, darker, and grayer (see
Fig. 2). The more comments Instagram posts received, the more likely they were posted by
depressed participants, but the opposite was true for likes received. In the Alldata model, higher
posting frequency was also associated with depression. Depressed participants were more likely
to post photos with faces, but had a lower average face count per photograph than healthy
participants. Finally, depressed participants were less likely to apply Instagram filters to their
posted photos.
Fig. 2. Magnitude and direction of regression coefficients in Alldata (N=24,713) and Prediagnosis (N=18,513)
models. Xaxis values represent the adjustment in odds of an observation belonging to depressed individuals, per
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
Fig. 1. Comparison of HSV values. Right photograph has higher Hue (bluer), lower Saturation (grayer), and lower
Brightness (darker) than left photograph. Instagram photos posted by depressed individuals had HSV values
shifted towards those in the right photograph, compared with photos posted by healthy individuals.
Units of observation
In determining the best time span for this analysis, we encountered a difficult question:
When and for how long does depression occur? A diagnosis of depression does not indicate the
persistence of a depressive state for every moment of every day, and to conduct analysis using an
individual’s entire posting history as a single unit of observation is therefore rather specious. At
the other extreme, to take each individual photograph as units of observation runs the risk of
being too granular. DeChoudhury et al. (5) looked at all of a given user’s posts in a single day,
and aggregated those data into perperson, perday units of observation. We adopted this
precedent of “userdays” as a unit of analysis . 5
Statistical framework
We used Bayesian logistic regression with uninformative priors to determine the strength
of individual predictors. Two separate models were trained. The Alldata model used all
collected data to address Hypothesis 1. The Prediagnosis model used all data collected from
higher Hue (bluer)
lower Saturation (grayer)
lower Brightness (darker)
55. Digital Phenotype:
Your Instagram knows if you are depressed
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
. In particular, depressedχ2 07.84, p .17e 64;( All = 9 = 9 − 1 13.80, p .87e 44)χ2Pre = 8 = 2 − 1
participants were less likely than healthy participants to use any filters at all. When depressed
participants did employ filters, they most disproportionately favored the “Inkwell” filter, which
converts color photographs to blackandwhite images. Conversely, healthy participants most
disproportionately favored the Valencia filter, which lightens the tint of photos. Examples of
filtered photographs are provided in SI Appendix VIII.
Fig. 3. Instagram filter usage among depressed and healthy participants. Bars indicate difference between observed
and expected usage frequencies, based on a Chisquared analysis of independence. Blue bars indicate
disproportionate use of a filter by depressed compared to healthy participants, orange bars indicate the reverse.
56. Digital Phenotype:
Your Instagram knows if you are depressed
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
VIII. Instagram filter examples
Fig. S8. Examples of Inkwell and Valencia Instagram filters. Inkwell converts
color photos to blackandwhite, Valencia lightens tint. Depressed participants
most favored Inkwell compared to healthy participants, Healthy participants
57.
58. • 아이폰의 센서로 측정한 자신의 의료/건강 데이터를 플랫폼에 공유 가능
• 가속도계, 마이크, 자이로스코프, GPS 센서 등을 이용
• 걸음, 운동량, 기억력, 목소리 떨림 등등
• 기존의 의학연구의 문제를 해결: 충분한 의료 데이터의 확보
• 연구 참여자 등록에 물리적, 시간적 장벽을 제거 (1번/3개월 ➞ 1번/1초)
• 대중의 의료 연구 참여 장려: 연구 참여자의 수 증가
• 발표 후 24시간 내에 수만명의 연구 참여자들이 지원
• 사용자 본인의 동의 하에 진행
Research Kit
62. Autism and Beyond EpiWatchMole Mapper
measuring facial expressions of young
patients having autism
measuring morphological changes
of moles
measuring behavioral data
of epilepsy patients
63. •스탠퍼드의 심혈관 질환 연구 앱, myHeart
• 발표 하루만에 11,000 명의 참가자가 등록
• 스탠퍼드의 해당 연구 책임자 앨런 영,
“기존의 방식으로는 11,000명 참가자는
미국 전역의 50개 병원에서 1년간 모집해야 한다”
64. •파킨슨 병 연구 앱, mPower
• 발표 하루만에 5,589 명의 참가자가 등록
• 기존에 6000만불을 들여 5년 동안 모집한
환자의 수는 단 800명
69. Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
70. PwC Health Research Institute Health wearables: Early days2
insurers—offering incentives for
use may gain traction. HRI’s survey
Source: HRI/CIS Wearables consumer survey 2014
21%
of US
consumers
currently
own a
wearable
technology
product
2%
wear it a few
times a month
2%
no longer
use it
7%
wear it a few
times a week
10%
wear it
everyday
Figure 2: Wearables are not mainstream – yet
Just one in five US consumers say they own a wearable device.
Intelligence Series sought to better
understand American consumers’
attitudes toward wearables through
done with the data.
PwC, Health wearables: early days, 2014
71. PwC | The Wearable Life | 3
device (up from 21% in 2014). And 36% own more than one.
We didn’t even ask this question in our previous survey since
it wasn’t relevant at the time. That’s how far we’ve come.
millennials are far more likely to own wearables than older
adults. Adoption of wearables declines with age.
Of note in our survey findings, however: Consumers aged
35 to 49 are more likely to own smart watches.
Across the board for gender, age, and ethnicity, fitness
wearable technology is most popular.
Fitness band
Smart clothing
Smart video/
photo device
(e.g. GoPro)
Smart watch
Smart
glasses*
45%
14%
27%
15%
12%
Base: Respondents who currently own at least one device (pre-quota sample, n=700); Q10A/B/C/D/E. Please tell us your relationship with the following wearable
technology products. *Includes VR/AR glasses
Fitness runs away with it
% respondents who own type of wearable device
PwC,The Wearable Life 2.0, 2016
• 49% own at least one wearable device (up from 21% in2014)
• 36% own more than one device.
92. IEEE Trans Biomed Eng. 2014 Jul
An Ingestible Sensor
for Measuring Medication Adherence
d again on
imal was
ysis were
s detected,
risk of
ed with a
his can be
s during
can be
on, placed
filling, or
an edible
monstrated
cases, the
nts of the
ve release
ity, visual
a suitable
The 0.9% of devices that went undetected represent
contributions from all components of the system. For the
sensor, the most likely contribution is due to physiological
corner cases, where a combination of stomach environment
and receiver-sensor orientation may result in a small
proportion of devices (no greater than 0.9%) being missed.
Table IV- Exposure and performance in clinical trials
412 subjects
20,993 ingestions
Maximum daily ingestion: 34
Maximum use days: 90 days
99.1% Detection accuracy
100% Correct identification
0% False positives
No SAEs / UADEs related to system
Trials were conducted in the following patient populations. The number of
patients in each study is indicated in parentheses: Healthy Volunteers (296),
Cardiovascular disease (53), Tuberculosis (30), Psychiatry (28).
SAE = Serious Adverse Event; UADE = Unanticipated Adverse Device
Effect)
Exposure and performance in clinical trials
111. Inherited Conditions
혈색소증은 유전적 원인으로 철에 대한 체내 대사에 이상이 생겨 음식을
통해 섭취한 철이 너무 많이 흡수되는 질환입니다. 너무 많이 흡수된 철
은 우리 몸의 여러 장기, 특히 간, 심장 및 췌장에 과다하게 축적되며 이
들 장기를 손상시킴으로써 간질환, 심장질환 및 악성종양을 유발합니다.
112. Traits
음주 후 얼굴이 붉어지는가
쓴 맛을 감지할 수 있나
귀지 유형
눈 색깔
곱슬머리 여부
유당 분해 능력
말라리아 저항성
대머리가 될 가능성
근육 퍼포먼스
혈액형
노로바이러스 저항성
HIV 저항성
흡연 중독 가능성
118. Human genomes are being sequenced at an ever-increasing rate. The 1000 Genomes Project has
aggregated hundreds of genomes; The Cancer Genome Atlas (TGCA) has gathered several thousand; and
the Exome Aggregation Consortium (ExAC) has sequenced more than 60,000 exomes. Dotted lines show
three possible future growth curves.
DNA SEQUENCING SOARS
2001 2005 2010 2015 2020 2025
100
103
106
109
Human Genome Project
Cumulativenumberofhumangenomes
1000 Genomes
TCGA
ExAC
Current amount
1st personal genome
Recorded growth
Projection
Double every 7 months (historical growth rate)
Double every 12 months (Illumina estimate)
Double every 18 months (Moore's law)
Michael Einsetein, Nature, 2015
119. more rapid and accurate approaches to infectious diseases. The driver mutations and key biologic unde
Sequencing Applications in Medicine
from Prewomb to Tomb
Cell. 2014 Mar 27; 157(1): 241–253.
120. Step1. Measure the Data
• With your smartphone
• With wearable devices (connected to smartphone)
• Personal genome analysis
... without even going to the hospital!
127. Epic MyChart App Epic EHR
Dexcom CGM
Patients/User
Devices
EHR Hospital
Whitings
+
Apple Watch
Apps
HealthKit
128.
129. • 애플 HealthKit 가 미국의 23개 선도병원 중에, 14개의 병원과 협력
• 경쟁 플랫폼 Google Fit, S-Health 보다 현저히 빠른 움직임
• Beth Israel Deaconess 의 CIO
• “25만명의 환자들 중 상당수가 웨어러블로 각종 데이터 생산 중.
이 모든 디바이스에 인터페이스를 우리 병원은 제공할 수 없다.
하지만 애플이라면 가능하다.”
2015.2.5
135. Epic MyChart Epic EHR
Dexcom CGM
Patients/User
Devices
EHR Hospital
Whitings
+
Apple Watch
Apps
HealthKit
136. transfer from Share2 to HealthKit as mandated by Dexcom receiver
Food and Drug Administration device classification. Once the glucose
values reach HealthKit, they are passively shared with the Epic
MyChart app (https://www.epic.com/software-phr.php). The MyChart
patient portal is a component of the Epic EHR and uses the same data-
base, and the CGM values populate a standard glucose flowsheet in
the patient’s chart. This connection is initially established when a pro-
vider places an order in a patient’s electronic chart, resulting in a re-
quest to the patient within the MyChart app. Once the patient or
patient proxy (parent) accepts this connection request on the mobile
device, a communication bridge is established between HealthKit and
MyChart enabling population of CGM data as frequently as every 5
Participation required confirmation of Bluetooth pairing of the CGM re-
ceiver to a mobile device, updating the mobile device with the most recent
version of the operating system, Dexcom Share2 app, Epic MyChart app,
and confirming or establishing a username and password for all accounts,
including a parent’s/adolescent’s Epic MyChart account. Setup time aver-
aged 45–60 minutes in addition to the scheduled clinic visit. During this
time, there was specific verbal and written notification to the patients/par-
ents that the diabetes healthcare team would not be actively monitoring
or have real-time access to CGM data, which was out of scope for this pi-
lot. The patients/parents were advised that they should continue to contact
the diabetes care team by established means for any urgent questions/
concerns. Additionally, patients/parents were advised to maintain updates
Figure 1: Overview of the CGM data communication bridge architecture.
BRIEFCOMMUNICATION
Kumar R B, et al. J Am Med Inform Assoc 2016;0:1–6. doi:10.1093/jamia/ocv206, Brief Communication
byguestonApril7,2016http://jamia.oxfordjournals.org/Downloadedfrom
•Apple HealthKit, Dexcom CGM기기를 통해 지속적으로 혈당을 모니터링한 데이터를 EHR과 통합
•당뇨환자의 혈당관리를 향상시켰다는 연구결과
•Stanford Children’s Health와 Stanford 의대에서 10명 type 1 당뇨 소아환자 대상으로 수행 (288 readings /day)
•EHR 기반 데이터분석과 시각화는 데이터 리뷰 및 환자커뮤니케이션을 향상
•환자가 내원하여 진료하는 기존 방식에 비해 실시간 혈당변화에 환자가 대응
JAMIA 2016
Remote Patients Monitoring
via Dexcom-HealthKit-Epic-Stanford
142. How long will you wait to see a doctor?
http://money.cnn.com/interactive/economy/average-doctor-wait-times/
143. Average Time to Appointment (Familiy Medicine)
Boston
LA
Portland
Miami
Atlanta
Denver
Detroit
New York
Seattle
Houston
Philadelphia
Washington DC
San Diego
Dallas
Minneapolis
Total
0 30 60 90 120
20.3
10
8
24
30
9
17
8
24
14
14
9
7
8
59
63
19.5
10
5
7
14
21
19
23
26
16
16
24
12
13
20
66
29.3 days
8 days
12 days
13 days
17 days
17 days
21 days
26 days
26 days
27 days
27 days
27 days
28 days
39 days
42 days
109 days
2017
2014
2009
156. •AP 통신: 로봇이 인간 대신 기사를 작성
•초당 2,000 개의 기사 작성 가능
•기존에 300개 기업의 실적 ➞ 3,000 개 기업을 커버
157. • 1978
• As part of the obscure task of “discovery” —
providing documents relevant to a lawsuit — the
studios examined six million documents at a
cost of more than $2.2 million, much of it to pay
for a platoon of lawyers and paralegals who
worked for months at high hourly rates.
• 2011
• Now, thanks to advances in artificial intelligence,
“e-discovery” software can analyze documents
in a fraction of the time for a fraction of the
cost.
• In January, for example, Blackstone Discovery of
Palo Alto, Calif., helped analyze 1.5 million
documents for less than $100,000.
158. •일본의 Fukoku 생명보험에서는 보험금 지급 여부를 심사하
는 사람을 30명 이상 해고하고, IBM Watson Explorer 에
게 맡기기로 결정
•의료 기록을 바탕으로 Watson이 보험금 지급 여부를 판단
•인공지능으로 교체하여 생산성을 30% 향상
•2년 안에 ROI 가 나올 것이라고 예상
•1년차: 140m yen
•2년차: 200m yen
163. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
164.
165. 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100
90%
50%
10%
PT-AI
AGI
EETNTOP100 Combined
언제쯤 기계가 인간 수준의 지능을 획득할 것인가?
Philosophy and Theory of AI (2011)
Artificial General Intelligence (2012)
Greek Association for Artificial Intelligence
Survey of most frequently cited 100 authors (2013)
Combined
응답자
누적 비율
Superintelligence, Nick Bostrom (2014)
166. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
167. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
Q: 초인공지능이란 영역은 도달 가능한 것인가?
Q: 초지능을 가진 개체의 출현이 가능할 것이라고 생각하는가?
Table 1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Table 1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Q: 초지능의 실현이 일어나기를 희망하는가?
Table 1-1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
Complicated Complicated Complicated YES Complicated YES YES Complicated Complicated
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
170. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
171. •약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
•강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
•초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
172. “As soon as it works, no one calls it artificial intelligence any more.”
- John McCarthy (1927-2011)
181. 600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
182.
183.
184. Annals of Oncology (2016) 27 (suppl_9): ix179-ix180. 10.1093/annonc/mdw601
Validation study to assess performance of IBM cognitive
computing system Watson for oncology with Manipal
multidisciplinary tumour board for 1000 consecutive cases:
An Indian experience
• MMDT(Manipal multidisciplinary tumour board) treatment recommendation and
data of 1000 cases of 4 different cancers breast (638), colon (126), rectum (124)
and lung (112) which were treated in last 3 years was collected.
• Of the treatment recommendations given by MMDT, WFO provided
50% in REC, 28% in FC, 17% in NREC
• Nearly 80% of the recommendations were in WFO REC and FC group
• 5% of the treatment provided by MMDT was not available with WFO
• The degree of concordance varied depending on the type of cancer
• WFO-REC was high in Rectum (85%) and least in Lung (17.8%)
• high with TNBC (67.9%); HER2 negative (35%)
• WFO took a median of 40 sec to capture, analyze and give the treatment.
(vs MMDT took the median time of 15 min)
186. Empowering the Oncology Community for Cancer Care
Genomics
Oncology
Clinical
Trial
Matching
Watson Health’s oncology clients span more than 35 hospital systems
“Empowering the Oncology Community
for Cancer Care”
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
187. 식약처 인공지능
가이드라인 초안
Medtronic과
혈당관리 앱 시연
2011 2012 2013 2014 2015
Jeopardy! 우승
뉴욕 MSK암센터 협력
(Lung cancer)
MD앤더슨 협력
(Leukemia)
MD앤더슨
Pilot 결과 발표
@ASCO
Watson Fund,
WellTok 에 투자
($22m)
The NewYork
Genome Center 협력
(Glioblastoma 분석)
GeneMD,
Watson Mobile Developer
Challenge의 winner 선정
Watson Fund,
Pathway Genomics 투자
Cleveland Clinic 협력
(Cancer Genome Analysis)
한국 IBM
Watson 사업부 신설
Watson Health 출범
Phytel & Explorys 인수
J&J,Apple, Medtronic 협력
Epic & Mayo Clinic 제휴
(EHR data 분석)
동경대 도입
(oncology)
14 Cancer Center 제휴
(Cancer Genome Analysis)
Mayo Clinic 협력
(clinical trail matching)
Watson Fund,
Modernizing Medicine
투자
Academia
Business
Pathway Genomics OME
closed alpha 시작
TurvenHealth
인수
Apple ResearchKit
통한 수면 연구 시작
2017
가천대 길병원
Watson 도입
(oncology)
Medtronic
Sugar.IQ 출시
제약사
Teva와 제휴
인도 Manipal Hospital
Watson 도입
태국 Bumrungrad
International Hospital,
Watson 도입
최윤섭 디지털헬스케어 연구소, 소장
(주)디지털 헬스케어 파트너스, 대표파트너
최윤섭, Ph.D.
yoonsup.choi@gmail.com
IBM Watson in Healthcare
Merge
Healthcare
인수
2016
Under Amour
제휴
Broad 연구소 협력 발표
(유전체 분석-항암제 내성)
Manipal Hospital의
WFO 정확성 발표
대구가톨릭병원
대구동산병원
WFO 도입
건양대병원
Watson 도입
(oncology)
부산대학병원
Watson 도입
(oncology/
genomics)
188. 식약처 인공지능
가이드라인 초안
Medtronic과
혈당관리 앱 시연
2011 2012 2013 2014 2015
Jeopardy! 우승
뉴욕 MSK암센터 협력
(Lung cancer)
MD앤더슨 협력
(Leukemia)
MD앤더슨
Pilot 결과 발표
@ASCO
Watson Fund,
WellTok 에 투자
($22m)
The NewYork
Genome Center 협력
(Glioblastoma 분석)
GeneMD,
Watson Mobile Developer
Challenge의 winner 선정
Watson Fund,
Pathway Genomics 투자
Cleveland Clinic 협력
(Cancer Genome Analysis)
한국 IBM
Watson 사업부 신설
Watson Health 출범
Phytel & Explorys 인수
J&J,Apple, Medtronic 협력
Epic & Mayo Clinic 제휴
(EHR data 분석)
동경대 도입
(oncology)
14 Cancer Center 제휴
(Cancer Genome Analysis)
Mayo Clinic 협력
(clinical trail matching)
Watson Fund,
Modernizing Medicine
투자
Academia
Business
Pathway Genomics OME
closed alpha 시작
TurvenHealth
인수
Apple ResearchKit
통한 수면 연구 시작
2017
가천대 길병원
Watson 도입
(oncology)
Medtronic
Sugar.IQ 출시
제약사
Teva와 제휴
인도 Manipal Hospital
Watson 도입
태국 Bumrungrad
International Hospital,
Watson 도입
최윤섭 디지털헬스케어 연구소, 소장
(주)디지털 헬스케어 파트너스, 대표파트너
최윤섭, Ph.D.
yoonsup.choi@gmail.com
IBM Watson in Healthcare
Merge
Healthcare
인수
2016
Under Amour
제휴
Broad 연구소 협력 발표
(유전체 분석-항암제 내성)
Manipal Hospital의
WFO 정확성 발표
대구가톨릭병원
대구동산병원
WFO 도입
건양대병원
Watson 도입
(oncology)
부산대학병원
Watson 도입
(oncology/
genomics)
189. IBM Watson Health
Organizations Leveraging Watson
Watson for Oncology
Best Doctors (second opinion)
Bumrungrad International Hospital
Confidential client (Bangladesh and Nepal)
Gachon University Gil Medical Center (Korea)
Hangzhou Cognitive Care – 50+ Chinese hospitals
Jupiter Medical Center
Manipal Hospitals – 16 Indian Hospitals
MD Anderson (**Oncology Expert Advisor)
Memorial Sloan Kettering Cancer Center
MRDM - Zorg (Netherlands)
Pusan National University Hospital
Clinical Trial Matching
Best Doctors (second opinion)
Confidential – Major Academic Center
Highlands Oncology Group
Froedtert & Medical College of Wisconsin
Mayo Clinic
Multiple Life Sciences pilots
24
Watson Genomic Analytics
Ann & Robert H Lurie Children’s Hospital of Chicago
BC Cancer Agency
City of Hope
Cleveland Clinic
Columbia University, Irwing Cancer Center
Duke Cancer Institute
Fred & Pamela Buffett Cancer Center
Fleury (Brazil)
Illumina 170 Gene Panel
NIH Japan
McDonnell Institute at Washington University in St. Louis
New York Genome Center
Pusan National University Hospital
Quest Diagnostics
Stanford Health
University of Kansas Cancer Center
University of North Carolina Lineberger Cancer Center
University of Southern California
University of Washington Medical Center
University of Tokyo
Yale Cancer Center
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
196. 12 Olga Russakovsky* et al.
Fig. 4 Random selection of images in ILSVRC detection validation set. The images in the top 4 rows were taken from
ILSVRC2012 single-object localization validation set, and the images in the bottom 4 rows were collected from Flickr using
scene-level queries.
tage of all the positive examples available. The second is images collected from Flickr specifically for the de- http://arxiv.org/pdf/1409.0575.pdf
197. • Main competition
• 객체 분류 (Classification): 그림 속의 객체를 분류
• 객체 위치 (localization): 그림 속 ‘하나’의 객체를 분류하고 위치를 파악
• 객체 인식 (object detection): 그림 속 ‘모든’ 객체를 분류하고 위치 파악
16 Olga Russakovsky* et al.
Fig. 7 Tasks in ILSVRC. The first column shows the ground truth labeling on an example image, and the next three show
three sample outputs with the corresponding evaluation score.
http://arxiv.org/pdf/1409.0575.pdf
198. Performance of winning entries in the ILSVRC2010-2015 competitions
in each of the three tasks
http://image-net.org/challenges/LSVRC/2015/results#loc
Single-object localization
Localizationerror
0
10
20
30
40
50
2011 2012 2013 2014 2015
Object detection
Averageprecision
0.0
17.5
35.0
52.5
70.0
2013 2014 2015
Image classification
Classificationerror
0
10
20
30
2010 2011 2012 2013 2014 2015
199.
200. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual Learning for Image Recognition”, 2015
How deep is deep?
204. DeepFace: Closing the Gap to Human-Level
Performance in FaceVerification
Taigman,Y. et al. (2014). DeepFace: Closing the Gap to Human-Level Performance in FaceVerification, CVPR’14.
Figure 2. Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three
locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million
parameters, where more than 95% come from the local and fully connected layers.
very few parameters. These layers merely expand the input
into a set of simple local features.
The subsequent layers (L4, L5 and L6) are instead lo-
cally connected [13, 16], like a convolutional layer they ap-
ply a filter bank, but every location in the feature map learns
a different set of filters. Since different regions of an aligned
image have different local statistics, the spatial stationarity
The goal of training is to maximize the probability of
the correct class (face id). We achieve this by minimiz-
ing the cross-entropy loss for each training sample. If k
is the index of the true label for a given input, the loss is:
L = log pk. The loss is minimized over the parameters
by computing the gradient of L w.r.t. the parameters and
Human: 95% vs. DeepFace in Facebook: 97.35%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
205. FaceNet:A Unified Embedding for Face
Recognition and Clustering
Schroff, F. et al. (2015). FaceNet:A Unified Embedding for Face Recognition and Clustering
Human: 95% vs. FaceNet of Google: 99.63%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
False accept
False reject
s. This shows all pairs of images that were
on LFW. Only eight of the 13 errors shown
he other four are mislabeled in LFW.
on Youtube Faces DB
ge similarity of all pairs of the first one
our face detector detects in each video.
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
lead to truly amazing results. Figure 7 shows one cluster in
a users personal photo collection, generated using agglom-
erative clustering. It is a clear showcase of the incredible
invariance to occlusion, lighting, pose and even age.
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
sification. Our end-to-end training both simplifies the setup
and shows that directly optimizing a loss relevant to the task
at hand improves performance.
Another strength of our model is that it only requires
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
206. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
v
om
Samy Bengio
Google
bengio@google.com
Dumitru Erhan
Google
dumitru@google.com
s a
cts
his
re-
m-
ed
he
de-
nts
A group of people
shopping at an
outdoor market.
!
There are many
vegetables at the
fruit stand.
Vision!
Deep CNN
Language !
Generating!
RNN
Figure 1. NIC, our model, is based end-to-end on a neural net-
work consisting of a vision CNN followed by a language gener-
207. Show and Tell:
A Neural Image Caption Generator
Vinyals, O. et al. (2015). Show and Tell:A Neural Image Caption Generator, arXiv:1411.4555
Figure 5. A selection of evaluation results, grouped by human rating.
212. Business Area
Medical Image Analysis
VUNOnet and our machine learning technology will help doctors and hospitals manage
medical scans and images intelligently to make diagnosis faster and more accurately.
Original Image Automatic Segmentation EmphysemaNormal ReticularOpacity
Our system finds DILDs at the highest accuracy * DILDs: Diffuse Interstitial Lung Disease
Digital Radiologist
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
214. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
215. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
216. Digital Radiologist
Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.
Collaboration with Prof. Joon Beom Seo (Asan Medical Center)
Analysed 1200 patients for 3 months
Feature Engineering vs Feature Learning
alization of Hand-crafted Feature vs Learned Feature in 2D
Feature Engineering vs Feature Learning
• Visualization of Hand-crafted Feature vs Learned Feature in 2D
Visualization of Hand-crafted Feature vs Learned Feature in 2D
217. Bench to Bedside : Practical Applications
• Contents-based Case Retrieval
–Finding similar cases with the clinically matching context - Search engine for medical images.
–Clinicians can refer the diagnosis, prognosis of past similar patients to make better clinical decision.
–Accepted to present at RSNA 2017
Digital Radiologist
221. Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
223. Training Set / Test Set
• CNN으로 후향적으로 128,175개의 안저 이미지 학습
• 미국의 안과전문의 54명이 3-7회 판독한 데이터
• 우수한 안과전문의들 7-8명의 판독 결과와 인공지능의 판독 결과 비교
• EyePACS-1 (9,963 개), Messidor-2 (1,748 개)a) Fullscreen mode
b) Hit reset to reload this image. This will reset all of the grading.
c) Comment box for other pathologies you see
eFigure 2. Screenshot of the Second Screen of the Grading Tool, Which Asks Graders to Assess the
Image for DR, DME and Other Notable Conditions or Findings
224. • EyePACS-1 과 Messidor-2 의 AUC = 0.991, 0.990
• 7-8명의 안과 전문의와 sensitivity, specificity 가 동일한 수준
• F-score: 0.95 (vs. 인간 의사는 0.91)
Additional sensitivity analyses were conducted for sev-
eralsubcategories:(1)detectingmoderateorworsediabeticreti-
effects of data set size on algorithm performance were exam-
ined and shown to plateau at around 60 000 images (or ap-
Figure 2. Validation Set Performance for Referable Diabetic Retinopathy
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
EyePACS-1: AUC, 99.1%; 95% CI, 98.8%-99.3%A
100
High-sensitivity operating point
High-specificity operating point
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
Messidor-2: AUC, 99.0%; 95% CI, 98.6%-99.5%B
100
High-specificity operating point
High-sensitivity operating point
Performance of the algorithm (black curve) and ophthalmologists (colored
circles) for the presence of referable diabetic retinopathy (moderate or worse
diabetic retinopathy or referable diabetic macular edema) on A, EyePACS-1
(8788 fully gradable images) and B, Messidor-2 (1745 fully gradable images).
The black diamonds on the graph correspond to the sensitivity and specificity of
the algorithm at the high-sensitivity and high-specificity operating points.
In A, for the high-sensitivity operating point, specificity was 93.4% (95% CI,
92.8%-94.0%) and sensitivity was 97.5% (95% CI, 95.8%-98.7%); for the
high-specificity operating point, specificity was 98.1% (95% CI, 97.8%-98.5%)
and sensitivity was 90.3% (95% CI, 87.5%-92.7%). In B, for the high-sensitivity
operating point, specificity was 93.9% (95% CI, 92.4%-95.3%) and sensitivity
was 96.1% (95% CI, 92.4%-98.3%); for the high-specificity operating point,
specificity was 98.5% (95% CI, 97.7%-99.1%) and sensitivity was 87.0% (95%
CI, 81.1%-91.0%). There were 8 ophthalmologists who graded EyePACS-1 and 7
ophthalmologists who graded Messidor-2. AUC indicates area under the
receiver operating characteristic curve.
Research Original Investigation Accuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
Results
228. LETTERH
his task, the CNN achieves 72.1±0.9% (mean±s.d.) overall
he average of individual inference class accuracies) and two
gists attain 65.56% and 66.0% accuracy on a subset of the
set. Second, we validate the algorithm using a nine-class
rtition—the second-level nodes—so that the diseases of
have similar medical treatment plans. The CNN achieves
two trials, one using standard images and the other using
images, which reflect the two steps that a dermatologist m
to obtain a clinical impression. The same CNN is used for a
Figure 2b shows a few example images, demonstrating th
distinguishing between malignant and benign lesions, whic
visual features. Our comparison metrics are sensitivity an
Acral-lentiginous melanoma
Amelanotic melanoma
Lentigo melanoma
…
Blue nevus
Halo nevus
Mongolian spot
…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
Convolution
AvgPool
MaxPool
Concat
Dropout
Fully connected
Softmax
Deep CNN layout. Our classification technique is a
Data flow is from left to right: an image of a skin lesion
e, melanoma) is sequentially warped into a probability
over clinical classes of skin disease using Google Inception
hitecture pretrained on the ImageNet dataset (1.28 million
1,000 generic object classes) and fine-tuned on our own
29,450 skin lesions comprising 2,032 different diseases.
ning classes are defined using a novel taxonomy of skin disease
oning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melano
melanoma). Inference classes are more general and are comp
or more training classes (for example, malignant melanocytic
class of melanomas). The probability of an inference class is c
summing the probabilities of the training classes according to
structure (see Methods). Inception v3 CNN architecture repr
from https://research.googleblog.com/2016/03/train-your-ow
classifier-with.html
GoogleNet Inception v3
• 129,450개의 피부과 병변 이미지 데이터를 자체 제작
• 미국의 피부과 전문의 18명이 데이터 curation
• CNN (Inception v3)으로 이미지를 학습
• 피부과 전문의들 21명과 인공지능의 판독 결과 비교
• 표피세포 암 (keratinocyte carcinoma)과 지루각화증(benign seborrheic keratosis)의 구분
• 악성 흑색종과 양성 병변 구분 (표준 이미지 데이터 기반)
• 악성 흑색종과 양성 병변 구분 (더마토스코프로 찍은 이미지 기반)
229. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
21명 중에 인공지능보다 정확성이 떨어지는 피부과 전문의들이 상당수 있었음
피부과 전문의들의 평균 성적도 인공지능보다 좋지 않았음
230. Skin cancer classification performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
Algorithm: AUC = 0.94
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
cancer classification performance of the CNN and
232. Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
A B DC
Benign without atypia / Atypic / DCIS (ductal carcinoma in situ) / Invasive Carcinoma
Interpretation?
Elmore etl al. JAMA 2015
233. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
234. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
235. Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
• Concordance noted in 5194 of 6900 case interpretations or 75.3%.
• Reference diagnosis was obtained from consensus of 3 experienced breast pathologists.
spentonthisactivitywas16(95%CI,15-17);43participantswere
awarded the maximum 20 hours.
Pathologists’ Diagnoses Compared With Consensus-Derived
Reference Diagnoses
The 115 participants each interpreted 60 cases, providing 6900
total individual interpretations for comparison with the con-
sensus-derived reference diagnoses (Figure 3). Participants
agreed with the consensus-derived reference diagnosis for
75.3% of the interpretations (95% CI, 73.4%-77.0%). Partici-
pants (n = 94) who completed the CME activity reported that
Patient and Pathologist Characteristics Associated With
Overinterpretation and Underinterpretation
The association of breast density with overall pathologists’
concordance (as well as both overinterpretation and under-
interpretation rates) was statistically significant, as shown
in Table 3 when comparing mammographic density grouped
into 2 categories (low density vs high density). The overall
concordance estimates also decreased consistently with
increasing breast density across all 4 Breast Imaging-
Reporting and Data System (BI-RADS) density categories:
BI-RADS A, 81% (95% CI, 75%-86%); BI-RADS B, 77% (95%
Figure 3. Comparison of 115 Participating Pathologists’ Interpretations vs the Consensus-Derived Reference
Diagnosis for 6900 Total Case Interpretationsa
Participating Pathologists’ Interpretation
ConsensusReference
Diagnosisb
Benign
without atypia Atypia DCIS
Invasive
carcinoma Total
Benign without atypia 1803 200 46 21 2070
Atypia 719 990 353 8 2070
DCIS 133 146 1764 54 2097
Invasive carcinoma 3 0 23 637 663
Total 2658 1336 2186 720 6900
DCIS indicates ductal carcinoma
in situ.
a
Concordance noted in 5194 of
6900 case interpretations or
75.3%.
b
Reference diagnosis was obtained
from consensus of 3 experienced
breast pathologists.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Comparison of 115 Participating Pathologists’ Interpretations vs
the Consensus-Derived Reference Diagnosis for 6900 Total Case Interpretations
245. PTSD (외상 후 스트레스 장애)
• PTSD는 전쟁, 고문, 자연재해, 범죄, 테러 등의 심각한 사건을 경험한 후, 사
건 이후에도 그 사건에 공포감을 느끼고 트라우마를 느끼는 질환
• 환자들은 악몽을 꾸거나, 특정 장면이 영화의 회상 장면(Flashback)처
럼 재현되는 등의 증상을 가지게 되며, 사고와 연관된 자극을 회피
• 이러한 변화에 따라서 일상 사회 생활에도 어려움을 겪거나, 우울증, 분
노 장애 등을 동반하는 경우 많음
• 이라크전 참전 군인의 15.6-17.1%, 아프가니스탄 전에 참전 군인의 11.2%
가 PTSD 를 겪음 (NEJM, 2004)
248. Prolonged Exposure Therapy
(지속 노출 치료)
•PTSD 치료를 위해 가장 효과적인 치료로 증명된 원리
•환자가 트라우마를 갖고 있는 상황과 기억에 지속적으로 노출시켜
스트레스와 회피 행동을 감소시키는 치료 방식
•트라우마에 대한 기억을 반복해서 떠올리게 되는데,
이러한 과정을 거치며 특정 기억과 반응의 연결고리를 약화 시킴
249.
250. 지속 노출 치료의 한계
• 환자들이 트라우마를 떠올리는 것에 거부감을 느끼거나, 효과적으로 상상하지 못함
• 사실 그 자체가 PTSD 의 증상의 하나
• 환자가 트라우마에 대한 기억을 생생하게 시각화하지 못하면 치료 효과 감소
어떻게 환자에게 실감나는 상황을 시각화 해줄 것인가
252. VirtualVietnam
•VR은 PTSD의 치료를 위해 1990년대부터 활용
•최초의 시도: 버추얼 베트남 (1997)
• 정글을 헤치고 나가는 시나리오 / 군용 헬리곱터가 날아가는 시나리오
• 그래픽 수준, 구현 효과 및 시나리오 등이 제한적
• 전통적 심리 치료에 효과 없던 환자 전원이 유의미한 개선 효과
“영상 속에서 베트남 사람들과 탱크를 보았어요”
256. Virtual Iraq 의 다양한 시나리오
•시가지: 황량한 거리에 낡은 건물과 금방 무너질 것만 같은
아파트, 창고, 모스크, 공장 등이 있는 상황. 인적이나 교통
량이 거의 없는 버전과, 사람과 교통량이 많은 두 가지 버전
•시가지 빌딩 내부: 시가지의 일부 빌딩은 환자가 내부로 들
어가볼 수 있도록 내부 구조가 모델링. 빌딩은 비어있게 할
수도 있고, 적거나 많은 거주자가 내부에 있도록 설정 가능
•검문소: 시가지 시나리오의 일부로, 차량이 도시로 진입하
기 위해 정지하는 검문소 상황.
•작은 시골 마을: 쓰러져가는 건물과 전투의 잔해들이 있는
작은 마을을 재현. 주변에 식물들이 많고, 건물들 사이로 멀
리 사막이 보임
•사막 기지: 군인들, 텐트, 군용 장비 등이 설치 되어 있는 사
막의 기지를 재현.
•사막 도로: 비포장 도로의 환경. 각각 도시, 작은 시골 마을,
사막 기지 시나리오로 이어짐. 사막의 사구, 식물들, 낡은 건
물들, 전투 잔해, 길가의 사람 등으로 구성.
Fig. 1. Outskirts of Virtual Iraq City
Fig. 2. Center Area of Virtual Iraq City
Fig. 3. Car Bombing in Virtual Iraq City
User-Centered tests with the application were conducte
the Naval Medical CenteroSan Diego and within an Army
Combat Stress Control Team in Iraq (See Figure 8). This
d at
usability of the prototype system application that fed an
iterative design process. A clinical trial version of the
application built from this process is currently being tested
with PTSD-diagnosed personnel at a variety of sites. The
Fig. 4. Interior view from of Desert Road Humvee Scenario
Fig. 5. Turret view from of Desert Road Humvee Scenario
Fig. 6. IED Attack in Desert Road Humvee Scenario
257.
258. 오즈의 마법사:
시각-촉각-청각-후각을 통한 전쟁의 재현
• 상담사는 환자가 처해있는 모든 상황을 실시간으로 컨트롤 (‘오즈의 마법사’)
• 환자가 실제 트라우마를 가진 상황을 최대한 비슷하게 재현
• 시각적, 청각적, 후각적, 촉각적 상황을 컨트롤
• 다양한 군용 차량 / 근처에 있는 건물, 차, 탱크 등을 폭파
• 비행기나 헬리콥터를 머리 위에 출현, 낮/밤, 비/안개
• 다양한 상황을 재현 가능
• 총격전이 벌어지거나, 매복에 당한 상황, 로켓포가 날아오는 상황
• 동료가 죽거나 부상을 입은 상황, 사람의 시체나 잔해를 본 상황
• 적군이나 민간인에게 총격을 가한 상황 등등
259. scores at baseline, post treatment and 3-month follow-up are in Fig
group, mean Beck Anxiety Inventory scores significantly decrea
(9.5) to 11.9 (13.6), (t=3.37, df=19, p < .003) and mean PHQ-9
decreased 49% from 13.3 (5.4) to 7.1 (6.7), (t=3.68, df=19, p < 0.00
Figure 4. PTSD Checklist scores across treatment Figure 5. BAI and PH
The average number of sessions for this sample was just under
successful treatment completers had documented mild and mode
injuries, which suggest that this form of exposure can be useful
PTSD Checklist scores across treatment
• 연구 결과 20명의 환자들은 전반적으로 유의미한 개선을 보임
• 환자들 전체의 PCL-M 수치가 평균 54.4에서 35.6으로 감소
• 20명 중 16명은 치료 직후에 더 이상 PTSD 를 가지지 않은 것으로 나타남
• 치료가 끝난지 3개월 후에 환자들의 상태는 유지
http://www.ncbi.nlm.nih.gov/pubmed/19377167
260. reatment and 3-month follow-up are in Figure 4. For this same
iety Inventory scores significantly decreased 33% from 18.6
=3.37, df=19, p < .003) and mean PHQ-9 (depression) scores
3 (5.4) to 7.1 (6.7), (t=3.68, df=19, p < 0.002) (see Figure 5).
ores across treatment Figure 5. BAI and PHQ-Depression scores
r of sessions for this sample was just under 11. Also, two of the
mpleters had documented mild and moderate traumatic brain
that this form of exposure can be usefully applied with this
BAI and PHQ-Depression scores
• 벡 불안 지수는 평균 18.6에서 11.9로 33% 감소
• PHQ-9 우울증 지수 역시 13.3에서 7.1로 49% 감소
• 경미한 외상성 뇌손상 (traumatic brain injury) 환자 2명에도 유의미한 효과
http://www.ncbi.nlm.nih.gov/pubmed/19377167
261. Three Steps to Implement Digital Medicine
• Step 1. Measure the Data
• Step 2. Collect the Data
• Step 3. Insight from the Data