SlideShare une entreprise Scribd logo
1  sur  18
Visualizing High
Dimensional Data with
Manifold Learning in R
BY COLLEEN M. FARRELLY, DATA SCIENTIST AT GRAHAM HOLDINGS
(KAPLAN HIGHER AND PROFESSIONAL EDUCATION)
My Path to Data Science
Former MD/PhD student who started doing research/attending workshops in geometry,
topology, and machine learning
Switched degree programs into biostatistics with a topology-based slant
Have worked in biotechnology, military, education, and the social sciences
Currently on the business side of running a university, with a lot of financial modeling and risk
modeling
Mining for Data Relationships
Exploratory analysis
 Important step in data science
projects
 Trend/covariance visualization
 Clustering
 Powerful combination for
understanding many types of
problems
Types of data problems
 Time series analyses
 Predictive analyses
 Network analyses
9
3
13
5
1
7
8
14
10
11
12
6
16
15
17
2
4
0204060
Intelligence and Achievement Dendrogram
hclust (*, "complete")
dist(mydata[, 2:4])
Height
Unique subgroup identified
Time Series and Financial Data
Key tasks in time
series/financial data
analyses:
 Forecasting future time
points
 Identifying drivers of the
dynamic process (ex. why
are sales rising?)
 Identifying tipping points
(crashes, spikes…)
 Identifying covarying
behavior (sectors that
behave similarly, stocks that
influence each other, daily
rising/falling patterns…)
Dow Jones Industrial Average
Morse-Smale Clustering
Multivariate technique from topology
similar to mode clustering
 Find peaks and valleys in data by filtering
on a defined function:
 A watershed on mountains
 Dribbling a soccer ball across a field of hills
 Separate data based on shared peaks
and valleys
 Many nice developments on
convergence and theoretical properties
R package has nice dimensionality
reduction plots to highlight cluster
differences with respect to the filter
function and predictor sets
5
Dimensionality Reduction and Visualization
Helpful in visualizing multivariate trends and group
differences, particularly for multivariate time series
data
Assume data lies in a lower-dimensional subspace and
map full dataset to that subspace (right)
Types of methods:
 Linear (principal component analysis, or PCA)
 Nonlinear (manifold learning)
 Local (preserving neighborhood metrics like distance
between points)
 Global (preserving global characteristics like
connectedness and limits)
Manifold learning methods related to a branch of
mathematics called differential geometry
Manifold Learning Methods
Three main methods considered in this analysis:
 Multidimensional scaling (MDS)
 Global method based on distance preservation and matrix
decomposition
 Distances can be Euclidean, geodesic, Manhattan...
 Nice theoretical result relating it to PCA when best subspace is
linear
 Locally linear embedding (LLE)
 Local method based on nearest neighbor graph, weighting, and
matrix decomposition
 Related to ISOMAP and other methods
 t-distributed stochastic neighbor embedding (t-SNE)
 Local and global method based on mapping of probability
distributions and random walks
 Preserves both local and global characteristics of the original data
space
 Very strong performance on a variety of problems lately
Breast Cancer Dataset Comparison
Example Stock Market Dataset
Emerging markets
 Important for investors
 Future drivers of global trade
 Global trends
 Daily fluctuations
 Tipping points (crashes and opportunities)
This example:
 Recent Kaggle dataset of daily National Stock
Exchange of India prices from July 2003-
February 2018:
 https://www.kaggle.com/abhishekyana/nse-listed-
1384-companies-data/data
 Cleaned (nulls removed, <1%) and daily fluctuation
ranges added (7 total time series columns)
 3616 days included
Clustering Results
R package (msr)
 10 nearest
neighbors
 Persistence
level=1
 5 level splits
 Plot of group
trajectories (far
left)
4 distinct groups
 2 represent stable
trends (red, blue)
 2 represent
transition points in
market behavior
(green, aqua)
PCA Plot
R function
princomp()
with 2
components
Fits quite well
and shows
spread within
each cluster
MDS Plot
R function
cmdscale() with
2 components
and a Euclidean
distance metric
Relationships
very linear and
well-separated
globally
 Matches PCA
well
 Separates into:
1. Daily price
2. Daily
fluctuation
0 5000 10000 15000
-600-400-2000
MDS Results
Dimension 1
Dimension2
LLE Plot
R function lle()
with 2
components
and 10 nearest
neighbors (lle
package)
Separation and
fit not great
Suggests global
behavior more
important than
local for this
time series 0 1 2 3
-4-3-2-101
LLE Results
Dimension 1
Dimension2
t-SNE Plot
R package dimRed
with function
getDimRedData(),
perplexity
(smoothing) at 80,
2 components, and
tsne method
Parses out tipping
points within
growth period and
exact moments of
transitional events
(see green group)
-30 -20 -10 0 10 20 30 40
-30-20-100102030
tSNE Results
Dimension 1
Dimension2
Deep Dive into MDS Components
MDS components separate into prices
(component 1) and fluctuation ranges
(component 2), summarized in
correlation table
Fluctuation ranges increasing as the
market gains points (left)
Original Time Series MDS Component 1 MDS Component 2
open 1.00E+00 3.25E-03
high 1.00E+00 -6.71E-03
low 1.00E+00 9.00E-03
fluctuation.range 6.84E-01 -7.06E-01
close 1.00E+00 -2.56E-03
day.range 5.14E-01 -7.47E-01
adj_close 1.00E+00 -2.41E-03
Transition Periods Deep Dive
Transition
periods
overlap with
long-term
trends
Shorter time-
to-transition
periods in
recent years
Results Overview
NSE shows exponential growth in a time period of changes
 New regulations
 Oil price drops
 Fall of inflation
Tipping points of growth
 Includes current period, starting late 2017/early 2018
 Actually predicted tumble of NSE during February of 2018 in late 2017
 Crash predicted by several economists for sometime in 2018:
 https://www.getmoneyrich.com/indian-stock-market-correction-likely-in-2017-2018/
 https://www.livemint.com/Money/pXdnLHA2r1FJfwJhFEDqjO/Stock-market-crash-Experts-divided-on-whether-theres-more.html
Fluctuations and volatility
 Increasing in past few years
 Can vary a lot during the day while starting and closing with similar values
Conclusions
Clustering and dimensionality reduction for
multivariate data exploration
 Helpful for understanding multivariate time
series data
 Helpful for understanding other types of data
prior to analysis
Performs very well, showing behavior
deviations before major events
Can provide an understanding of covariance
structure (relationships between stocks,
volatility within a market…)
References
Farrelly, C. M. (2017). Dimensionality Reduction Ensembles. arXiv preprint arXiv:1710.04484.
Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–smale
regression. Journal of Computational and Graphical Statistics, 22(1), 193-214.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika, 29(1), 1-27.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning
research, 9(Nov), 2579-2605.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear
embedding. science, 290(5500), 2323-2326.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and
intelligent laboratory systems, 2(1-3), 37-52.
ResearchGate profile with folder for talk (data, R code, PPT):
https://www.researchgate.net/profile/Colleen_Farrelly2

Contenu connexe

Tendances

Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data AnalysisMason Porter
 
Numerical computation
Numerical computationNumerical computation
Numerical computationmilinda1100
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Vector calculus in Robotics Engineering
Vector calculus in Robotics EngineeringVector calculus in Robotics Engineering
Vector calculus in Robotics EngineeringNaveensing87
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
道具としての機械学習:直感的概要とその実際
道具としての機械学習:直感的概要とその実際道具としての機械学習:直感的概要とその実際
道具としての機械学習:直感的概要とその実際Ichigaku Takigawa
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 

Tendances (20)

Artificial Intelligence and Mathematics
Artificial Intelligence and MathematicsArtificial Intelligence and Mathematics
Artificial Intelligence and Mathematics
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data Analysis
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Numerical computation
Numerical computationNumerical computation
Numerical computation
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
3 data visualization
3 data visualization3 data visualization
3 data visualization
 
Vector calculus in Robotics Engineering
Vector calculus in Robotics EngineeringVector calculus in Robotics Engineering
Vector calculus in Robotics Engineering
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data science
Data scienceData science
Data science
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
Data visualization
Data visualizationData visualization
Data visualization
 
道具としての機械学習:直感的概要とその実際
道具としての機械学習:直感的概要とその実際道具としての機械学習:直感的概要とその実際
道具としての機械学習:直感的概要とその実際
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 

Similaire à High-Dimensional Data Visualization, Geometry, and Stock Market Crashes

KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...Alkis Vazacopoulos
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing theIJDKP
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingJaspreet Issaj
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataIJCSIS Research Publications
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National DevelopmentProf Ashis Sarkar
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical dataeSAT Journals
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...GagandeepKaur872517
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression IJECEIAES
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rIlonaThornburg83
 
A hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial dataA hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial dataijdms
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 

Similaire à High-Dimensional Data Visualization, Geometry, and Stock Market Crashes (20)

KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National Development
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical data
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, r
 
Lesson 6 chapter 4
Lesson 6   chapter 4Lesson 6   chapter 4
Lesson 6 chapter 4
 
A hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial dataA hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial data
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 

Plus de Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

Plus de Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Dernier

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Dernier (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

High-Dimensional Data Visualization, Geometry, and Stock Market Crashes

  • 1. Visualizing High Dimensional Data with Manifold Learning in R BY COLLEEN M. FARRELLY, DATA SCIENTIST AT GRAHAM HOLDINGS (KAPLAN HIGHER AND PROFESSIONAL EDUCATION)
  • 2. My Path to Data Science Former MD/PhD student who started doing research/attending workshops in geometry, topology, and machine learning Switched degree programs into biostatistics with a topology-based slant Have worked in biotechnology, military, education, and the social sciences Currently on the business side of running a university, with a lot of financial modeling and risk modeling
  • 3. Mining for Data Relationships Exploratory analysis  Important step in data science projects  Trend/covariance visualization  Clustering  Powerful combination for understanding many types of problems Types of data problems  Time series analyses  Predictive analyses  Network analyses 9 3 13 5 1 7 8 14 10 11 12 6 16 15 17 2 4 0204060 Intelligence and Achievement Dendrogram hclust (*, "complete") dist(mydata[, 2:4]) Height Unique subgroup identified
  • 4. Time Series and Financial Data Key tasks in time series/financial data analyses:  Forecasting future time points  Identifying drivers of the dynamic process (ex. why are sales rising?)  Identifying tipping points (crashes, spikes…)  Identifying covarying behavior (sectors that behave similarly, stocks that influence each other, daily rising/falling patterns…) Dow Jones Industrial Average
  • 5. Morse-Smale Clustering Multivariate technique from topology similar to mode clustering  Find peaks and valleys in data by filtering on a defined function:  A watershed on mountains  Dribbling a soccer ball across a field of hills  Separate data based on shared peaks and valleys  Many nice developments on convergence and theoretical properties R package has nice dimensionality reduction plots to highlight cluster differences with respect to the filter function and predictor sets 5
  • 6. Dimensionality Reduction and Visualization Helpful in visualizing multivariate trends and group differences, particularly for multivariate time series data Assume data lies in a lower-dimensional subspace and map full dataset to that subspace (right) Types of methods:  Linear (principal component analysis, or PCA)  Nonlinear (manifold learning)  Local (preserving neighborhood metrics like distance between points)  Global (preserving global characteristics like connectedness and limits) Manifold learning methods related to a branch of mathematics called differential geometry
  • 7. Manifold Learning Methods Three main methods considered in this analysis:  Multidimensional scaling (MDS)  Global method based on distance preservation and matrix decomposition  Distances can be Euclidean, geodesic, Manhattan...  Nice theoretical result relating it to PCA when best subspace is linear  Locally linear embedding (LLE)  Local method based on nearest neighbor graph, weighting, and matrix decomposition  Related to ISOMAP and other methods  t-distributed stochastic neighbor embedding (t-SNE)  Local and global method based on mapping of probability distributions and random walks  Preserves both local and global characteristics of the original data space  Very strong performance on a variety of problems lately Breast Cancer Dataset Comparison
  • 8. Example Stock Market Dataset Emerging markets  Important for investors  Future drivers of global trade  Global trends  Daily fluctuations  Tipping points (crashes and opportunities) This example:  Recent Kaggle dataset of daily National Stock Exchange of India prices from July 2003- February 2018:  https://www.kaggle.com/abhishekyana/nse-listed- 1384-companies-data/data  Cleaned (nulls removed, <1%) and daily fluctuation ranges added (7 total time series columns)  3616 days included
  • 9. Clustering Results R package (msr)  10 nearest neighbors  Persistence level=1  5 level splits  Plot of group trajectories (far left) 4 distinct groups  2 represent stable trends (red, blue)  2 represent transition points in market behavior (green, aqua)
  • 10. PCA Plot R function princomp() with 2 components Fits quite well and shows spread within each cluster
  • 11. MDS Plot R function cmdscale() with 2 components and a Euclidean distance metric Relationships very linear and well-separated globally  Matches PCA well  Separates into: 1. Daily price 2. Daily fluctuation 0 5000 10000 15000 -600-400-2000 MDS Results Dimension 1 Dimension2
  • 12. LLE Plot R function lle() with 2 components and 10 nearest neighbors (lle package) Separation and fit not great Suggests global behavior more important than local for this time series 0 1 2 3 -4-3-2-101 LLE Results Dimension 1 Dimension2
  • 13. t-SNE Plot R package dimRed with function getDimRedData(), perplexity (smoothing) at 80, 2 components, and tsne method Parses out tipping points within growth period and exact moments of transitional events (see green group) -30 -20 -10 0 10 20 30 40 -30-20-100102030 tSNE Results Dimension 1 Dimension2
  • 14. Deep Dive into MDS Components MDS components separate into prices (component 1) and fluctuation ranges (component 2), summarized in correlation table Fluctuation ranges increasing as the market gains points (left) Original Time Series MDS Component 1 MDS Component 2 open 1.00E+00 3.25E-03 high 1.00E+00 -6.71E-03 low 1.00E+00 9.00E-03 fluctuation.range 6.84E-01 -7.06E-01 close 1.00E+00 -2.56E-03 day.range 5.14E-01 -7.47E-01 adj_close 1.00E+00 -2.41E-03
  • 15. Transition Periods Deep Dive Transition periods overlap with long-term trends Shorter time- to-transition periods in recent years
  • 16. Results Overview NSE shows exponential growth in a time period of changes  New regulations  Oil price drops  Fall of inflation Tipping points of growth  Includes current period, starting late 2017/early 2018  Actually predicted tumble of NSE during February of 2018 in late 2017  Crash predicted by several economists for sometime in 2018:  https://www.getmoneyrich.com/indian-stock-market-correction-likely-in-2017-2018/  https://www.livemint.com/Money/pXdnLHA2r1FJfwJhFEDqjO/Stock-market-crash-Experts-divided-on-whether-theres-more.html Fluctuations and volatility  Increasing in past few years  Can vary a lot during the day while starting and closing with similar values
  • 17. Conclusions Clustering and dimensionality reduction for multivariate data exploration  Helpful for understanding multivariate time series data  Helpful for understanding other types of data prior to analysis Performs very well, showing behavior deviations before major events Can provide an understanding of covariance structure (relationships between stocks, volatility within a market…)
  • 18. References Farrelly, C. M. (2017). Dimensionality Reduction Ensembles. arXiv preprint arXiv:1710.04484. Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-214. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1-27. Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500), 2323-2326. Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52. ResearchGate profile with folder for talk (data, R code, PPT): https://www.researchgate.net/profile/Colleen_Farrelly2