SlideShare une entreprise Scribd logo
1  sur  18
Mining dynamic social networks from
public news articles for company value
prediction.
- PRATIK, MICHEL, KAI & MINGHAO
Objectives and Key notes
What we discovered!
1. Study, analyze and understand impactful relations that exist between companies.
2. Transform the discovered relations into intercompany networks, revealing features
and metrics about the company.
3. Generate models that integrate network-feature metrics as well as company
financial valuations in order to substantially project or predict a company’s future
value OR profit over time e.g.
Metrics like Number of company's’ a company relates with (Network feature metric),
Company’s profit (financial metric).
Concepts and Techniques utilized.
Network Analysis
 Graph theory
 Ranking
Machine learning Algorithms
 Regression (𝑦 = 𝑎 + 𝑏𝑥)
Statistical Methods
 Correlation. (𝑅2
)
 Mean Squared Error.
Algebraic equations
 e.g the one that they used for the relation score
Choice of research domain
Document-level and sentence-level co-occurrence
The more companies co-appear or are described together in important news articles
and/or sentences, the stronger their mutual relationship.
NB: The study doesn’t extract specific relations separately but rather generalizes all
co-occurrence’s as impact relations, i.e., how many impacts a company receives from
others, by considering consider positive/negative structural impacts from networks.
Research Coverage
For a Target company
Generation of inter-company networks entailing Local and global relations, historical
relations and the delta change in impact of relations over time.
Borrowing the Page ranking algorithm ideology used in Information retrieval systems.
Companies are ranked by each network feature and company valuations.(e.g. Profit)
Usage of machine learning algorithm such as linear regression and SVM regression to
combine the features of the longitudinal network with a company’s financial
information to predict the company value.
Extracting Data
New York Times
Social Network Data
From the large scalable Public data about companies available in the news and
electronically through the web. (News Articles mainly. ). Data dated from 1981 – 2009
(year by year).
e.g. IBM appeared in about 300 news articles in the New York Times in 2009 (277 articles
as IBM and 84 articles as International Business Machines).
Interviews, Questionnaires and Observations.
Financial Data.
 Company valuations were also obtained from New York Times Fortune 500 List (1955 -
2009) .
Pre-processing the data
For a Target company
For target company x, let candidate company be y (one that is impacting x in a period of
time t. Sets of documents D and sentences S in which they’ve co-occurred during time t
are collected.
Generating Longitudinal directed/undirected and valued/unvalued Networks over a
period of years for a set of companies 𝑉.
𝐺 𝑡 = {𝐺 𝑡1, 𝐺 𝑡2, 𝐺 𝑡3 … … … . } Where 𝑡1 < 𝑡2 < 𝑡3
For eachcompany
𝑥 ∈ 𝑉
a structural feature vector F 𝑥
𝑇
is generated F 𝑥
𝑇
⊆ G 𝑇
where F 𝑥
𝑇
indicates network
effects for target company x.
Evolution of Networks
Calculating Impact relation Strength
Algorithm
𝑆𝑐𝑜𝑟𝑒 𝑥(𝑦) = a* 𝑖∈𝐷 𝑥.𝑦
𝑡 𝑤 𝑑 𝑖 + b ∗ 𝑗𝜖𝐷 𝑥.𝑦
𝑡 𝑤𝑠 𝑗
𝑤 𝑑 𝑖 And 𝑤𝑠 𝑖 - Weights computed for the total number of documents and
sentences in which target company 𝑥 and candidate company 𝑌co-occur.
𝑤 𝑑(𝑖) = log(1 +
1
𝑌′ 𝑖
+
𝑡𝑓𝑥(𝑖)
𝑦∈{𝑥,𝑌} 𝑡𝑓𝑦(𝑖)
)
𝑤𝑠(𝑖) = log(1 +
1
𝑌′′ 𝑖
)
e.g. IBM in 2009. It is apparent that Microsoft had the greatest impact on IBM in 2009. They co-occurred in 55
articles and were described together in 264 sentences. From these sentences, we can infer that they are direct
competitors.
Sometimes impact isn’t obvious, SPSS and IBM are not competitors and co-occurred in only 1 article and in 3
sentences, but their relation is important because SPSS and IBM co- appeared in an article in a high-weight
document (which describes only SPSS and IBM’s acquisition relation in the entire article).
Mining Longitudinal Network
Network effects
Six types of network effects are considered.
1. The number of connections that target company has.
2. Distance between x and its related nodes.
3. The number of connections that the companies relating with target company have.
4. Number of connections among x’s related nodes.
5. Distance between target company’s related nodes.
6. Number of node pairs having x on the shortest path.
Mining Longitudinal Network
1. Network effects generation
A set of nodes that directly or indirectly impact focal company x is generated - 𝑁𝑥
3 different types of node pairs are defined,
𝑥, 𝑖 ∀ (𝑖 ∈ 𝑁𝑥) then
𝑖, 𝑗 ∀ (𝑖, 𝑗 ∈ 𝑁𝑥, 𝑖 ≠ 𝑗) and
𝑖 𝑖, 𝑘 ∀ (𝑖 ∈ 𝑁𝑥, 𝑘 ∈ 𝑉).
Measures of degree connectivity𝛽(𝑖, 𝑗), Eccentricity 𝜇(𝑖, 𝑗), betweeness 𝜁 𝑥(𝑖, 𝑗), are
computed and then standardized to the network size 𝑉 .
Further analysis on the Networks
Traversing the valued directed network for more patterns revealing possible impact
relations.
1. Two new sub-networks are incorporated.
Neighboring node sets 𝐿 𝑥 which are considered to exert an impact on to x through their
direct connection to 𝑁𝑥.
 NB: 𝐿 𝑥 ∶ 𝑁𝑥 - shows degree to which companies are directly related to x rather than
indirectly.
2. Retaining only arcs (directed edges) to reveal who is impacting who
3. Step 1(Network effects generation – (prev page)) is repeated to obtain historical
network effects.
Network Feature Selection
Filtering out companies with maximum Impact
Individual feature selection.
Companies are ranked by network features 𝑓𝑖 and by their valuations (profit).
𝑋𝑖 – Rank vector of companies ranked by network feature
Y – Companies ranked by their valuations like profit.
Spearman’s rank correlation is calculated between 𝑋𝑖 and Y.
The salient implication is that if there is an increase in the ratio of the number of
connections that a company has with the numbers of connections that its neighbors
have, then the value of its profits will increase.
Prediction Model
Network effects + Company valuations
Longitudinal network effects as well as valuations of each target company x are integrated into
Linear regression model (LRM) – Predicting a company’s current or future financial value.
Support vector regression model (SVR) – To learn Parameters.
Experimental results.
20 Fortune companies’ are selected as a sample. Their valuation records i.e. profits are captured and
networks are generated.
First, they calculate the mean profit value of the companies, then after train their model on the records for
records that span each five years networks, then after test it to predict the next five years profits then
they’re compared.
This is repeated for just a company.
Performance Evaluation
Prediction of the mean profits of 20 companies
Discovered
Network features do not seem to contribute
to revenue prediction but rather contribute
to predicting companies’ profit.
Company profit prediction by joint network
and financial analysis outperforms network-
only by 150% and financial-only by 34%.
Performance Evaluation
Prediction of the mean profits of IBM and INTEL
Aspects of Network science in paper.
 Graph-theory : such as degree of connectivity, diameter, shortest path used to calculate
network effects
 Developing models to understand the network
 Extracting data from NYT , Problem Statement part of Paper.
Building models to anticipate the evolution of the networks.
 Network effects, company valuations
Constructing models to optimise the outcomes of networks
Experimental results and improvements.
What else can be done.
Improvements
1. A company's value (or performance) may encompass several factors depending on the
context in which it’s defined. Such as
 Market performance, and Employee satisfaction and Responsibility. Analysis into these
aforementioned areas can potentially improve the model’s performance.
2. More social network data resources can be used. e.g.
 social media especially Twitter. e.g. Twitter analysis or Facebook analysis to get the longitudinal
social network data.
3. Categorizing relations as negative or positive using sentiment analysis. Separately handling
networks i.e. positive impact relations networks handled on their own as well as negative
impact relations networks.

Contenu connexe

Similaire à Mining dynamic social networks from public news articles for company value prediction.

Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Matthew Courtney
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
2014 USA
2014 USA2014 USA
2014 USALI HE
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligencefeiwin
 
INF 220 RANK Introduction Education--inf220rank.com
INF 220 RANK Introduction Education--inf220rank.comINF 220 RANK Introduction Education--inf220rank.com
INF 220 RANK Introduction Education--inf220rank.comagathachristie277
 
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITYRETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITYIRJET Journal
 
INF 220 Inspiring Innovation/tutorialrank.com
 INF 220 Inspiring Innovation/tutorialrank.com INF 220 Inspiring Innovation/tutorialrank.com
INF 220 Inspiring Innovation/tutorialrank.comjonhson138
 
INF 220 Education Organization - snaptutorial.com
INF 220  Education Organization - snaptutorial.comINF 220  Education Organization - snaptutorial.com
INF 220 Education Organization - snaptutorial.comdonaldzs208
 
Inf 220 Education Specialist -snaptutorial.com
Inf 220 Education Specialist -snaptutorial.comInf 220 Education Specialist -snaptutorial.com
Inf 220 Education Specialist -snaptutorial.comDavisMurphyC58
 
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15MLconf
 
INF 220 Effective Communication - tutorialrank.com
INF 220 Effective Communication - tutorialrank.comINF 220 Effective Communication - tutorialrank.com
INF 220 Effective Communication - tutorialrank.comBartholomew44
 
INF 220 Learn/newtonhelp.com
INF 220 Learn/newtonhelp.comINF 220 Learn/newtonhelp.com
INF 220 Learn/newtonhelp.comlechenau48
 
INF 220 Possible Is Everything/newtonhelp.com
INF 220 Possible Is Everything/newtonhelp.comINF 220 Possible Is Everything/newtonhelp.com
INF 220 Possible Is Everything/newtonhelp.comlechenau71
 
IBM Total Economic Impact Study - Cost Savings and Business Benefits
IBM Total Economic Impact Study - Cost Savings and Business BenefitsIBM Total Economic Impact Study - Cost Savings and Business Benefits
IBM Total Economic Impact Study - Cost Savings and Business BenefitsCasey Lucas
 
Inf 220 Future Our Mission/newtonhelp.com
Inf 220 Future Our Mission/newtonhelp.comInf 220 Future Our Mission/newtonhelp.com
Inf 220 Future Our Mission/newtonhelp.comamaranthbeg40
 
Paper Explained: Deep learning framework for measuring the digital strategy o...
Paper Explained: Deep learning framework for measuring the digital strategy o...Paper Explained: Deep learning framework for measuring the digital strategy o...
Paper Explained: Deep learning framework for measuring the digital strategy o...Devansh16
 
INF 220 EXceptional Education/snaptutorial.COM
INF 220 EXceptional Education/snaptutorial.COMINF 220 EXceptional Education/snaptutorial.COM
INF 220 EXceptional Education/snaptutorial.COMMcdonaldRyan18
 
OL 325 Final Project Guidelines and Rubric Overvie.docx
OL 325 Final Project Guidelines and Rubric    Overvie.docxOL 325 Final Project Guidelines and Rubric    Overvie.docx
OL 325 Final Project Guidelines and Rubric Overvie.docxcherishwinsland
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxssuser957b41
 

Similaire à Mining dynamic social networks from public news articles for company value prediction. (20)

Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
2014 USA
2014 USA2014 USA
2014 USA
 
Presentation2
Presentation2Presentation2
Presentation2
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
INF 220 RANK Introduction Education--inf220rank.com
INF 220 RANK Introduction Education--inf220rank.comINF 220 RANK Introduction Education--inf220rank.com
INF 220 RANK Introduction Education--inf220rank.com
 
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITYRETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
 
INF 220 Inspiring Innovation/tutorialrank.com
 INF 220 Inspiring Innovation/tutorialrank.com INF 220 Inspiring Innovation/tutorialrank.com
INF 220 Inspiring Innovation/tutorialrank.com
 
INF 220 Education Organization - snaptutorial.com
INF 220  Education Organization - snaptutorial.comINF 220  Education Organization - snaptutorial.com
INF 220 Education Organization - snaptutorial.com
 
Inf 220 Education Specialist -snaptutorial.com
Inf 220 Education Specialist -snaptutorial.comInf 220 Education Specialist -snaptutorial.com
Inf 220 Education Specialist -snaptutorial.com
 
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15
Mark Zangari, CEO, Quantellia at MLconf SEA - 5/01/15
 
INF 220 Effective Communication - tutorialrank.com
INF 220 Effective Communication - tutorialrank.comINF 220 Effective Communication - tutorialrank.com
INF 220 Effective Communication - tutorialrank.com
 
INF 220 Learn/newtonhelp.com
INF 220 Learn/newtonhelp.comINF 220 Learn/newtonhelp.com
INF 220 Learn/newtonhelp.com
 
INF 220 Possible Is Everything/newtonhelp.com
INF 220 Possible Is Everything/newtonhelp.comINF 220 Possible Is Everything/newtonhelp.com
INF 220 Possible Is Everything/newtonhelp.com
 
IBM Total Economic Impact Study - Cost Savings and Business Benefits
IBM Total Economic Impact Study - Cost Savings and Business BenefitsIBM Total Economic Impact Study - Cost Savings and Business Benefits
IBM Total Economic Impact Study - Cost Savings and Business Benefits
 
Inf 220 Future Our Mission/newtonhelp.com
Inf 220 Future Our Mission/newtonhelp.comInf 220 Future Our Mission/newtonhelp.com
Inf 220 Future Our Mission/newtonhelp.com
 
Paper Explained: Deep learning framework for measuring the digital strategy o...
Paper Explained: Deep learning framework for measuring the digital strategy o...Paper Explained: Deep learning framework for measuring the digital strategy o...
Paper Explained: Deep learning framework for measuring the digital strategy o...
 
INF 220 EXceptional Education/snaptutorial.COM
INF 220 EXceptional Education/snaptutorial.COMINF 220 EXceptional Education/snaptutorial.COM
INF 220 EXceptional Education/snaptutorial.COM
 
OL 325 Final Project Guidelines and Rubric Overvie.docx
OL 325 Final Project Guidelines and Rubric    Overvie.docxOL 325 Final Project Guidelines and Rubric    Overvie.docx
OL 325 Final Project Guidelines and Rubric Overvie.docx
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptx
 

Dernier

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Dernier (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Mining dynamic social networks from public news articles for company value prediction.

  • 1. Mining dynamic social networks from public news articles for company value prediction. - PRATIK, MICHEL, KAI & MINGHAO
  • 2. Objectives and Key notes What we discovered! 1. Study, analyze and understand impactful relations that exist between companies. 2. Transform the discovered relations into intercompany networks, revealing features and metrics about the company. 3. Generate models that integrate network-feature metrics as well as company financial valuations in order to substantially project or predict a company’s future value OR profit over time e.g. Metrics like Number of company's’ a company relates with (Network feature metric), Company’s profit (financial metric).
  • 3. Concepts and Techniques utilized. Network Analysis  Graph theory  Ranking Machine learning Algorithms  Regression (𝑦 = 𝑎 + 𝑏𝑥) Statistical Methods  Correlation. (𝑅2 )  Mean Squared Error. Algebraic equations  e.g the one that they used for the relation score
  • 4. Choice of research domain Document-level and sentence-level co-occurrence The more companies co-appear or are described together in important news articles and/or sentences, the stronger their mutual relationship. NB: The study doesn’t extract specific relations separately but rather generalizes all co-occurrence’s as impact relations, i.e., how many impacts a company receives from others, by considering consider positive/negative structural impacts from networks.
  • 5. Research Coverage For a Target company Generation of inter-company networks entailing Local and global relations, historical relations and the delta change in impact of relations over time. Borrowing the Page ranking algorithm ideology used in Information retrieval systems. Companies are ranked by each network feature and company valuations.(e.g. Profit) Usage of machine learning algorithm such as linear regression and SVM regression to combine the features of the longitudinal network with a company’s financial information to predict the company value.
  • 6. Extracting Data New York Times Social Network Data From the large scalable Public data about companies available in the news and electronically through the web. (News Articles mainly. ). Data dated from 1981 – 2009 (year by year). e.g. IBM appeared in about 300 news articles in the New York Times in 2009 (277 articles as IBM and 84 articles as International Business Machines). Interviews, Questionnaires and Observations. Financial Data.  Company valuations were also obtained from New York Times Fortune 500 List (1955 - 2009) .
  • 7. Pre-processing the data For a Target company For target company x, let candidate company be y (one that is impacting x in a period of time t. Sets of documents D and sentences S in which they’ve co-occurred during time t are collected. Generating Longitudinal directed/undirected and valued/unvalued Networks over a period of years for a set of companies 𝑉. 𝐺 𝑡 = {𝐺 𝑡1, 𝐺 𝑡2, 𝐺 𝑡3 … … … . } Where 𝑡1 < 𝑡2 < 𝑡3 For eachcompany 𝑥 ∈ 𝑉 a structural feature vector F 𝑥 𝑇 is generated F 𝑥 𝑇 ⊆ G 𝑇 where F 𝑥 𝑇 indicates network effects for target company x.
  • 9. Calculating Impact relation Strength Algorithm 𝑆𝑐𝑜𝑟𝑒 𝑥(𝑦) = a* 𝑖∈𝐷 𝑥.𝑦 𝑡 𝑤 𝑑 𝑖 + b ∗ 𝑗𝜖𝐷 𝑥.𝑦 𝑡 𝑤𝑠 𝑗 𝑤 𝑑 𝑖 And 𝑤𝑠 𝑖 - Weights computed for the total number of documents and sentences in which target company 𝑥 and candidate company 𝑌co-occur. 𝑤 𝑑(𝑖) = log(1 + 1 𝑌′ 𝑖 + 𝑡𝑓𝑥(𝑖) 𝑦∈{𝑥,𝑌} 𝑡𝑓𝑦(𝑖) ) 𝑤𝑠(𝑖) = log(1 + 1 𝑌′′ 𝑖 ) e.g. IBM in 2009. It is apparent that Microsoft had the greatest impact on IBM in 2009. They co-occurred in 55 articles and were described together in 264 sentences. From these sentences, we can infer that they are direct competitors. Sometimes impact isn’t obvious, SPSS and IBM are not competitors and co-occurred in only 1 article and in 3 sentences, but their relation is important because SPSS and IBM co- appeared in an article in a high-weight document (which describes only SPSS and IBM’s acquisition relation in the entire article).
  • 10. Mining Longitudinal Network Network effects Six types of network effects are considered. 1. The number of connections that target company has. 2. Distance between x and its related nodes. 3. The number of connections that the companies relating with target company have. 4. Number of connections among x’s related nodes. 5. Distance between target company’s related nodes. 6. Number of node pairs having x on the shortest path.
  • 11. Mining Longitudinal Network 1. Network effects generation A set of nodes that directly or indirectly impact focal company x is generated - 𝑁𝑥 3 different types of node pairs are defined, 𝑥, 𝑖 ∀ (𝑖 ∈ 𝑁𝑥) then 𝑖, 𝑗 ∀ (𝑖, 𝑗 ∈ 𝑁𝑥, 𝑖 ≠ 𝑗) and 𝑖 𝑖, 𝑘 ∀ (𝑖 ∈ 𝑁𝑥, 𝑘 ∈ 𝑉). Measures of degree connectivity𝛽(𝑖, 𝑗), Eccentricity 𝜇(𝑖, 𝑗), betweeness 𝜁 𝑥(𝑖, 𝑗), are computed and then standardized to the network size 𝑉 .
  • 12. Further analysis on the Networks Traversing the valued directed network for more patterns revealing possible impact relations. 1. Two new sub-networks are incorporated. Neighboring node sets 𝐿 𝑥 which are considered to exert an impact on to x through their direct connection to 𝑁𝑥.  NB: 𝐿 𝑥 ∶ 𝑁𝑥 - shows degree to which companies are directly related to x rather than indirectly. 2. Retaining only arcs (directed edges) to reveal who is impacting who 3. Step 1(Network effects generation – (prev page)) is repeated to obtain historical network effects.
  • 13. Network Feature Selection Filtering out companies with maximum Impact Individual feature selection. Companies are ranked by network features 𝑓𝑖 and by their valuations (profit). 𝑋𝑖 – Rank vector of companies ranked by network feature Y – Companies ranked by their valuations like profit. Spearman’s rank correlation is calculated between 𝑋𝑖 and Y. The salient implication is that if there is an increase in the ratio of the number of connections that a company has with the numbers of connections that its neighbors have, then the value of its profits will increase.
  • 14. Prediction Model Network effects + Company valuations Longitudinal network effects as well as valuations of each target company x are integrated into Linear regression model (LRM) – Predicting a company’s current or future financial value. Support vector regression model (SVR) – To learn Parameters. Experimental results. 20 Fortune companies’ are selected as a sample. Their valuation records i.e. profits are captured and networks are generated. First, they calculate the mean profit value of the companies, then after train their model on the records for records that span each five years networks, then after test it to predict the next five years profits then they’re compared. This is repeated for just a company.
  • 15. Performance Evaluation Prediction of the mean profits of 20 companies Discovered Network features do not seem to contribute to revenue prediction but rather contribute to predicting companies’ profit. Company profit prediction by joint network and financial analysis outperforms network- only by 150% and financial-only by 34%.
  • 16. Performance Evaluation Prediction of the mean profits of IBM and INTEL
  • 17. Aspects of Network science in paper.  Graph-theory : such as degree of connectivity, diameter, shortest path used to calculate network effects  Developing models to understand the network  Extracting data from NYT , Problem Statement part of Paper. Building models to anticipate the evolution of the networks.  Network effects, company valuations Constructing models to optimise the outcomes of networks Experimental results and improvements.
  • 18. What else can be done. Improvements 1. A company's value (or performance) may encompass several factors depending on the context in which it’s defined. Such as  Market performance, and Employee satisfaction and Responsibility. Analysis into these aforementioned areas can potentially improve the model’s performance. 2. More social network data resources can be used. e.g.  social media especially Twitter. e.g. Twitter analysis or Facebook analysis to get the longitudinal social network data. 3. Categorizing relations as negative or positive using sentiment analysis. Separately handling networks i.e. positive impact relations networks handled on their own as well as negative impact relations networks.

Notes de l'éditeur

  1. Precisely, The Paper aims to deal with the three bullets, in the order placed above. Main point: Researchers aimed to develop a formula that would predict a company’s financial value over a period of time. Techniques employed included: Network Analysis – (Graph theory), Statistical Methods – (Correlation), Machine learning Algorithms – (Regression), and Algebric equations – (e.g the one that they used for the relation score)
  2. The concept initiated in this research was interesting, Not one that can easily be thought of. It had the assumption that if companies co-appear in written records, then they’re most likely impacting each other in one way or another. Which definitely makes sense, For instance, in football, Often you’ll get two giant clubs (that are rivals) mentioned alongside each other in documents, articles or anywhere. They impact each other by virtue of their rivalry, As one goes into the market to purchase top players, the other makes a similar move just so to stay on top. However, Mutual relationship was something we didn’t agree with, because, the impact isn’t necessarily on a common understanding, but rather an automatic impact. So when the researcher claims mutual relationship
  3. Extracting data about the relations on a local as well as global level and drawing back the years to capture historical relations between companies was smart and brilliant. Past and present statistics speak volumes about the future. In order to filter out companies that made the most impact, an algorithm that ranks the relations between was useful. Page-ranking – (Used to rank the importance of web pages by count of back links on the page). Regression (Machine Learning algorithm) is a very reliable predictive analysis tool that was used o project outcomes or results after putting together all the necessary metrics as earlier talked about, Network feature metrics and financial metrics.
  4. New York times was the source of the data used in the research. Interviews, Questionnaires and Observations were a brilliant because researchers would then have more elaborated answers to their questions which would validate the data published by articles. DISAGREE: A variety of data sets generated from different locations would be ideal, We didn’t agree to the fact that only one source was used.
  5. Target company, was the focal company, So companies with whom it relates formed the network this company. 𝐺 𝑡 vector or set represents graphs for the specified times. 𝑥 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑉. You can flip through slides and talk about the network effects to make this clear enough.
  6. Relational score was an indicator of the strength of the relationship between companies. It can otherwise be understood as degree of connectivity – if implied through graph theory. Candidate company: Company being investigated to discover how strong it’s relationship with the target company is. RS was obtained by summing up the weights of the total number of documents and sentences in which the companies of interest co-appeared. The weights were obtained using formula’s above. 𝑌 ′ (𝑖) and 𝑌 ′′ (𝑗) – counts of the company names that appear in document I and sentences j 𝑡 𝑓 𝑥 (𝑖) and 𝑡 𝑓 𝑦 (𝑖) frequency of company name y in the document I and sentence j. A and b were constants that represented trade off’s between document weight and sentence weight. i.e. The higher this metric, the more connected the involved companies were.
  7. Degree of connectivity of target company. Eccentricity of target company. Degree of connectivity of candidate companies. Vertex degree of the graph Eccentricity of candidate companies (related nodes). Betweeness centrality.
  8. 𝑥,𝑖 ∀ (𝑖∈ 𝑁 𝑥 ) – target company x and and a candidate company I for each company I In Node set 𝑁 𝑥 𝑖,𝑗 ∀ (𝑖,𝑗∈ 𝑁 𝑥 , 𝑖≠𝑗) candidate companies I and j both belong to Node set 𝑁 𝑥 and 𝑖 𝑖,𝑘 ∀ (𝑖∈ 𝑁 𝑥 , 𝑘∈𝑉).k belongs to the big Network of all nodes, so it can belong to any subgraph within the entire network
  9. 𝐿 𝑥 - set of nodes that are indirectly connected to x 𝐿 𝑥 - set of nodes that are directly connected to x 3. We agree and very usefull
  10. Clearly there is an overlap in the research methodologies of these three areas: They draw on data gathered from social networks, infrastructures, sen sors and the Internet of Things