SlideShare une entreprise Scribd logo
1  sur  102
BIG	DATA	ANALYTICS
UNDERSTANDING	FOR	RESEARCH	ACTIVITY
Dr. Andry Alamsyah
Asosiasi Ilmuwan Data Indonesia
School of Economics and Business,Telkom University
Research Field :
Social	Computing,	Social	Network,	Complex	Network	/	Network	Science,	Computational	Social	
Science,	Data	Analytics,	Big	Data,	Data	Mining,	Graph	Theory,	Disruptive	Innovation	/	Disruptive	
Economy,	ICT	Entrepreneurial	Business,	Data	/	Information	Business
Andry Alamsyah
• Researcher / Data Scientist
• Director of Digital Business Ecosystem Research Centre
• Chief and Founder of Lab. Social Computing & Big Data
• Chairman & Founder Indonesian Data Scientist Society (AIDI)
email andry.alamsyah@gmail.com
blog andrya.staff.telkomuniversity.ac.id
repository	 telkomuniversity.academia.edu/andryalamsyah
repository researchgate.net/profile/Andry_Alamsyah
linkedin linkedin.com/andry.alamsyah
twitter twitter.com/andrybrew
Education :
S1	:	Mathematics	-	ITB,	Topic:		Statistics	
S2	:	Informatics	-	UPJV,	France,	Topic:	Information	System,	and	Multimedia	
S3	:	Electro	and	Informatics	-	ITB,	Topic:	Social	Network,	and	Big	Data
Links :
Introduction
• Background	and	Motivation		
• Big	Data	DeNinition	and	Related	Field	
• Understanding	Pattern	
• Data	Analytics	/	Machine	Learning	Fundamental	(Prediction	and	
Recommendation)	
• Social	Media	Analytics	(by	Case	Study)	
• Conclusion	
• Working	on	Your	Computer	(Machine	Learning	Practice)
Agenda
Background	&	Motivation
Remember	This	?
>	information	overload,			
>	technological	based	society	
>	acquire	new	value	=>	new	culture
>	empowered	individuals	
>	more	data	available	
>	building	contextual	story	/	search
Digital	Ocean
Storytelling
Contextual	Story
• Industry	4.0	->	cyber	physical	system	->	enabling	human	to	produce	large-
scale	data	->	human	behaviour	quantiNication	
• Key	Technologies	:	data, computational power	and	connectivity;	analytics	
and	intelligence;	human	machine	interaction;	advanced	production	methods
the environment
Deloitte, Industry 4.0
Industry	4.0
Competing	Ecosystem	&	Data
Cheap	Change	Everything
efficient economy
new value proposition
• cutting through the BIG DATA hype
• cheap means everywhere
• cheap creates value
• from cheap to strategy
complex		
human	behaviour
market	uncertainty
business		
sustainability
disruptive	economic	
coopetitive,	cooperative,	
competitive
business	ecosystem		
/	platform
programmable	economy
event	driven	
API	economy
toward large-scale and massive
socio-economic impact
Industry 4.0
Big	Data	DeNinition	and	
Related	Field
Big	Data	DeNinition
•a term => describe extremely large amounts of structured and
unstructured data

•the activity => capture / storage / processing / sharing / reporting of
data => beyond the ability of legacy software tools and hardware
infrastructure

•related to many “science” branch => data analytics, data science,
machine learning,  artificial intelligence, IoT, and many more

•the application => on many field => efficient, cost-effective, faster &
accurate decision making
Gigabyte 109 = 1.000.000.000
Terabyte 1012 = 1.000.000.000.000
Petabyte
Exabyte
1015 = 1.000.000.000.000.000
Exabyte 1018 = 1.000.000.000.000.000.000
Zetabyte 1021 = 1.000.000.000.000.000.000.000
1990 2010 Hadoop
store 1400 MB store 1TB 100 drives
working at the
same time can
read 1TB data in
2 minutes
transfer speed 4.5 MB/s transfer speed 100 MB/s
read drive ~ 5 minutes read drive ~ 3 hours
Volume,	Variety,	and	Velocity	are	the	"essen+al"	characteris+cs	of	Big	Data
Veracity, and Value are the "quality" of Big Data
The	5'Vs
DATA ANALYTICS
-the discovery, interpretation, and communication of meaningful patterns in data (wikipedia)
-the process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make
more informed business decision
SOURCE
review, opinion,
historical data,
conversation,
network friendship,
CCTV, Vlog,
location tagging,
etc
BIG DATA
large, fast, complex
the 5V’s data
DATA SCIENCE
the science to extract
knowledge / pattern from data
SOCIAL COMPUTING
quantification of human / social
behaviour
INSIGHT
market segmentation, risk analytics
information dissemination,
recommended investment, fraud
detection, personalised adv, customer
acquisition and retention, purchase
behaviour, early detection event,
brand awareness, etc
opportunity activity
methodology
benefit
application
Big	Data	Related	Terms	(Use	Case)
Data	Analytics
• The discovery, interpretation, and communication of meaningful patterns in data (wikipedia)

• The process to uncover hidden patterns, unknown correlation, and other useful information that
can help organisations make more informed business decision predictive, descriptive, diagnostic,
prescriptive.
Predictive	Analytics
• study the past if you want to study the future (confucius)

• Predictive Analytics is the art of building and using models that make predictions
based on patterns extracted from historical data. Predictive analytics applications
include: price predictions, dosage predictions, risk assesment, propensity/likelihood
modelling, diagnosis, document classifications
• Predictive is the assignment of a value to any unknown variable.

• A model is trained to make predictions based on a set of historical examples. (we use
Machine Learning)
Data	Science
Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to
extract knowledge and insights from structured and unstructured data.
Data	Science
Body	of	Knowledge
CRISP-DM
CRISP-DM -> Cross -Industry Standard Process for Data Mining is an open standard process model that
describes common approaches used by data mining experts. It is the most widely-used analytics model.[2]
Structure	Data	Type
Column Value
Pa+ent Andry	Alamsyah
Date	of	Birth 12/07/1995
Date	Admi?ed 02/03/2019
“The patient came in complaining
of chest pain, shortness of breath,
and lingering headaches.. Smokes
2 packs a day.. Family history of
heart disease.. Has been
experiencing similar symptoms for
the past 12 hours…”
High	Degree	of	
Organiza+on,	such	as	a	
rela+onal	database
Informa+on	that	is	
difficult	to	organise	using	
tradi+onal	mechanisms	
VS
Structured Unstructured
Working	with	Structured	Data
Working	with	Unstructured	Data
brand A brand B
(Big)	Data	Opportunity
Understanding	Pattern
Understanding	Pattern
Structured Data
Mapping Position
Understanding	Pattern
Unstructured Data
Friendship Network
Understanding	Pattern
Unstructured Data
Growth Friendship Network
Understanding	Pattern
Unstructured Data
Conversational Network
Understanding	Pattern
Structured Data
Regional Economic Value
based on Checkin
Mechanism
How	Can	(Big)	Data	Analytics	Helps?
by describing the phenomenon,
by predicting the value,
by estimating the future outcome,
by optimising the resources and the
decision,
by simulating all the possible scenarios ..
Data	Analytics	/	Machine	Learning	
Fundamentals	(Prediction	and	
Recommendation)
• Machine learning is defined as an automated process that extracts
patterns from data to build the models used in predictive analytics
applications.

• A branch of artificial intelligence, concerned with the design and
development of algorithms that allow computers to evolve behaviours
based on empirical data.
Machine	Learning
Machine	Learning
Machine Learning is an idea to learn from
examples and experience, without being
explicitly programmed.
Instead of writing code, we feed data to the
generic algorithm, and it builds logic based
on the data given.
Computer Output
Program
Data
• Traditional Programming
Computer Program
Output
Data
• Machine Learning
Machine	Learning
Machine	 learning	 (ML)	 is	 the	 science	 of	
getting	 computers	 to	 act	 without	 being	
explicitly	programmed.	ML	has	given	us	self-
driving	 cars,	 practical	 speech	 recognition,	
effective	web	search,	and	a	vastly	improved	
understanding	of	the	human	genome.	ML	is	
pervasive	today,	we	probably	use	it	dozens	of	
times	a	day	without	knowing	it.	It	is	the	best	
way	 to	 make	 progress	 towards	 human-level	
AI.	(standford/coursera)	
ML	 is	 a	 type	 of	 artiNicial	 intelligence	 (AI)	
that	provides	computers	with	the	ability	to	
l e a r n	 w i t h o u t	 b e i n g	 e x p l i c i t l y	
p r o g r a m m e d .	 M L	 f o c u s e s	 o n	 t h e	
development	 of	 computer	 programs	 that	 can	
teach	 themselves	 to	 grow	 and	 change	 when	
exposed	to	new	data.	(whatis.com)
Learning	Methodology
Machine	Learning	in	Business
Finance and Banking
• Credit scoring
• Fraud detection
• Risk Analysis
• Portfolio Optimization
• Client Analysis
• Trading Exchange Forecasting
Retail and E-Commerce
• Price Optimization
• Recommendation
• Predictive Inventory Planning
• Fraud Detection
• Customer Segmentation
Manufacturing
• Predictive Maintenance or Condition
Monitoring.
• Warranty reserve estimation
• Demand forecasting
• Process Optimization
Marketing and Sales
• Market and Customer Segmentation
• Price Optimization
• Customer Churn Analysis
• Customer lifetime value prediction
• Sentiment Analysis in Social Networks
1.Formula	/	Function	
• T	=	0.48O	+	0.23TL	+	0.5D	
2.Decision	Tree	
3.	Correlation	or	Association	
4.Rule		
• IF	IPS3=2.8		THEN	graduate_ontime	
5.Cluster	
Output	/	Pattern	/	Model	/	Knowledge
Learning	Illustration
A
BA
B A
B
A
B A
B
A
B
A
B
A
B
A
B
Data ->
Two Possible Solutions
1 2
•It is based on a labeled training set.
•The class of each piece of data in
training set is known.
•Class labels are pre-determined
and provided in the training phase.
Supervised	Learning
A
B
A
B
A
B
e Class
l Class
l Class
l Class
e Class
e Class
“What is the class of this data point?”
Task performed : classification, pattern recognition
Supervised	Learning
•Prediction	methods	are	commonly	referred	to	as	
supervised	learning.	Supervised	methods	are	
thought	to	attempt	the	discovery	of	the	
relationships	between	input	attributes	and	a	target	
attribute.	
•A	training	set	is	given	and	the	objective	is	to	form	a	
description	that	can	be	used	to	predict	unseen	
examples.
Supervised	Learning
Problems	:	
• ClassiNication	
• The	domain	of	the	target	attribute	is	Ninite	and	categorical.	
• A	classiNier	must	assign	a	class	to	a	unseen	example.	
• Regression	
• The	target	attribute	is	formed	by	inNinite	values.	
• To	Nit	a	model	to	learn	the	output	target	attribute	as	a	function	of	input	
attributes.	
• Time	Series	Analysis	
• Making	predictions	in	time.
Supervised	Learning
Supervised	Learning
Class/Label/TargetAttribute/Feature
Nominal
Numerik
Unsupervised	Learning
•Input	:	set	of	patterns	P,	from	n-dimensional	space	S,	but	little	or	no	
information	about	their	classiNication,	evaluation,	interesting	features,	
etc.			
		It	must	learn	these	by	itself!		:	)	
•Tasks:	
- Clustering	-	Group	patterns	based	on	similarity		
- Vector	Quantisation	-	Fully	divide	up	S	into	a	small	set	of	regions	(deNined	by	
codebook	vectors)	that	also	helps	cluster	P.	
- Feature	Extraction	-	Reduce	dimensionality	of	S	by	removing	unimportant	
features	(i.e.	those	that	do	not	help	in	clustering	P)		
• There	is	no	supervisor	and	only	input	data	is	available.	
• The	aim	is	now	to	Nind	regularities,	irregularities,	relationships,	
similarities	and	associations	in	the	input.
Unsupervised	Learning
Problems	:	
• Clustering	
• Association	Rules	
• Pattern	Mining	
• It	is	adopted	as	a	more	general	term	than	frequent	pattern	mining	or	
association	mining	
• Outlier	Detection	
• It	is	the	process	of	Ninding	data	which	have	very	different	behaviour	
from	the	expectation	(outliers	or	anomalies)
Unsupervised	Learning
Unsupervised	Learning
Attribute/Feature
Background	:	
• How	to	learn	a	new	skill	
• Learning	and	intelligence	
• Interaction	with	environment	
• Goal-oriented	learning	
• Agent	–	Environment	interactions	
• Activities	
- What	to	do		
- How	to	map	situations	to	actions	
- Process	positive	and	negative	rewards
Reinforcement	Learning
Reinforcement	learning	(RL)	is	an	area	of	machine	learning	concerned	with	
how	software	agents	ought	to	take	actions	in	an	environment	so	as	to	maximise	
some	notion	of	cumulative	reward.	
Basic	reinforcement	is	modeled	as	a	Markov	decision	process,	and	are	often	stochastic	process
The	Analogy	
• A	child	learns	to	walk	
• The	child	is	an	agent	trying	to	manipulate	the	
environment	
• The	child	is	taking	actions	(state	1,	state	2,	state	
3,	and	so	on)	
• Positive	rewards	when	able	to	walk	
• Negative	rewards	when	not	able	to	walk
Reinforcement	Learning
Reinforcement	Learning
Various	Practical	applications	of	Reinforcement	Learning		
• RL	can	be	used	in	robotics	for	industrial	automation.	
• RL	can	be	used	in	machine	learning	and	data	processing	
• RL	can	be	used	to	create	training	systems	that	provide	custom	instruction	and	materials	according	
to	the	requirement	of	students.
Data	Preparation	(CRISP-DM)
Data Preprocessing
• Measures for data quality: A multidimensional view
• Accuracy: correct or wrong, accurate or not
• Completeness: not recorded, unavailable, …
• Consistency: some modified but some not, …
• Timeliness: timely update?
• Believability: how trustable the data are correct?
• Interpretability: how easily the data can be understood?
1.Data	Cleaning
a. Fill	in	missing	values
b. Smooth	noisy	data
c. Iden+fy	or	remove	outliers
d. Resolve	inconsistencies
2.Data	Reduc6on
a. Dimensionality	reduc+on
b. Numerosity	reduc+on
c. Data	compression
3.Data	Transforma6on	and	Data	Discre6sa6on
a. Normalisa+on
b. Concept	hierarchy	genera+on
4.Data	integra6on
a. Integra+on	of	mul+ple	databases	or	files
Data	Preprocessing	Task
Common	Data	
Analytics	Rules
Tasks Descrip6on Algorithms Examples
Classification Predict if data points belongs to one
of the predefined classes. Prediction
based on learning from known
dataset.
Decision tree, neural
network
Bucketing new customers into
one of the known customer
groups
Regression Predict the numeric target label of a
data point. Prediction based on
learning from known dataset.
Linear regression,
logistic regression
Estimating insurance premium
Clustering Identify natural clusters within the data
set based on inherit properties within
data set.
K-Means, density
based clustering
Finding customer segments in a
company based on transaction
and call data.
Association
Rules
Identify relationships within an item
set based on transaction data
FP-Growth algorithm,
Apriori
Find cross-selling opportunities
for a retailer based on
transaction purchase history
Anomaly
Detection
Predict if a data point is an outlier
compared to other data point in the
dataset
Distance based, density
based, Local Outlier
Factor (LOF)
Fraud transaction detection in
credit cards
Estimation
Customer Order Number	of	Traffic	Light Distance Travel	Time
1 3 3 3 16
2 1 7 4 20
3 2 4 6 18
4 4 6 8 36
...
1000 2 4 2 12
Label
Learning Model using
Estimation Methods (Linear
Regression)
Travel Time = 0.48O + 0.23TL + 0.5D
Knowledge
Pizza Delivery Time
Predictions
stock price dataset in
time series format
label
prediction using
Neural Network
Learning
prediction plot
ClassiNication
NIM Gender Nilai	UN Asal	Sekolah IPS1 IPS2 IPS3 IPS	4 ... Lulus	Tepat	
10001 L 28 SMAN	2 3.3 3.6 2.89 2.9 Ya
10002 P 27 SMA	DK 4.0 3.2 3.8 3.7 Tidak
10003 P 24 SMAN	1 2.7 3.4 4.0 3.5 Tidak
10004 L 26.4 SMAN	3 3.2 2.7 3.6 3.4 Ya
...
...
11000 L 23.4 SMAN	5 3.3 2.8 3.1 3.2 Ya
label
learning using C4.5
classification methods
input : golf playing recommendation
output (rules) :
If	outlook	=	sunny	and	humidity	=	high	then	play	=	no

If	outlook	=	rainy	and	windy	=	true	then	play	=	no

If	outlook	=	overcast	then	play	=	yes

If	humidity	=	normal	then	play	=	yes

If	none	of	the	above	then	play	=	yes
output (tree) :
ClassiNication
Clustering
dataset without label
learning using K-means
clustering methods
Association
learning using FP-Growth
association methods
1.Es+ma+on:
- Linear	Regression,	Neural	Network,	Support	Vector	Machine,	etc
2.Predic+on/Forecas+ng:
- Linear	Regression,	Neural	Network,	Support	Vector	Ma	chine,	etc
3.Classifica+on:
- Naive	Bayes,	K-Nearest	Neighbor,	C4.5,	ID3,	CART,	Linear	Discriminant	
Analysis,	Logis+c	Regression,	etc
4.Clustering:
- K-Means,	K-Medoids,	Self-Organizing	Map	(SOM),	Fuzzy	C-Means,	etc
5.Associa+on:
- FP-Growth,	A	Priori,	Coefficient	of	Correla+on,	Chi	Square,	etc
Algorithm	in	Data	Analytics
Based	on	Information	Theory,	for	example	in	Decision	Tree	model	
induced	by	the	concept	of	entropy	and	information	gain.
Information-Based	Learning
Is	it	a	man	?,		
Does	the	person	wear	glasses	?
Similarity-Based	Learning
Training
Records
Test Record
Compute
Distance
Basic	Idea	=>	If	it	walks	like	a	duck,	quack	like	a	duck,	then	it's	
probably	a	duck
the	best	way	to	make	a	predictions	is	to	simply	look	at	what	has	worked	well	in	the	past	and	
predict	the	same	thing	again.	for	examples	k-NN	and	k-means	algorithm	
similarity	can	be	represent	as	distance	(euclidean)
Probability-based	prediction	approaches	are	heavily	based	on	Bayes’	Theorem
Probability-Based	Learning
•	A	probabilistic	framework	for	solving	classiNication	problems	
•	Conditional	Probability	/	Bayes	Theorem
)(
)()|(
)|(
AP
CPCAP
ACP =
•	Given:		
• A	doctor	knows	that	meningitis	causes	stiff	neck	50%	of	the	time	
• Prior	probability	of	any	patient	having	meningitis	is	1/50,000	
• Prior	probability	of	any	patient	having	stiff	neck	is	1/20
•	If	a	patient	has	stiff	neck,	what’s	the	probability	he/she	has	meningitis?	
0002.0
20/1
50000/15.0
)(
)()|(
)|( =
´
==
SP
MPMSP
SMP
perform	a	search	for	a	set	of	parameters	for	a	parameterised	model	that	minimises	the	
total	error	across	the	predictions	made	by	that	model	with	respect	to	a	set	of	training	
instances.	For	example:	multivariable	linear	regression	with	gradient	descent,	support	
vector	machine	
Error-Based	Learning
B1
B2
b11
b12
b21
b22
margin
Linear Regression
Find hyperplane maximizes the margin 

=> B1 is better than B2
Support Vector Machine
Model	Evaluation
1.Estimation:	
- Error:	Root	Mean	Square	Error	(RMSE),	MSE,	MAPE,	etc	
2.Prediction/Forecasting	
- Error:	Root	Mean	Square	Error	(RMSE)	,	MSE,	MAPE,	etc	
3.ClassiNication:	
- Confusion	Matrix:	Accuracy	
- 	ROC	Curve:	Area	Under	Curve	(AUC)		
4.Clustering:	
- Internal	Evaluation:	Davies–Bouldin	index,	Dunn	index,		
- External	Evaluation:		Rand	measure,	F-measure,	Jaccard	index,	
Fowlkes–Mallows	index,	Confusion	matrix	
5.Association:	
- Lift	Charts:	Lift	Ratio	
- Precision	and	Recall	(F-measure)
learning and evaluation process confusion matrix
PREDICTED CLASS
ACTUAL

CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
FNFPTNTP
TNTP
dcba
da
+++
+
=
+++
+
=Accuracy
cba
a
pr
rp
ba
a
ca
a
++
=
+
=
+
=
+
=
2
22
(F)measure-F
(r)Recall
(p)Precision
Model	Evaluation
evaluation metric
Model	Evaluation
• Learning	curve	shows	how	
accuracy	changes	with	varying	
sample	size	
• Requires	a	sampling	schedule	for	
creating	learning	curve:	
- Arithmetic	sampling	
- Geometric	sampling	
• Effect	of	small	sample	size:	
- Bias	in	the	estimate	
- Variance	of	estimate
Increase	Coverage
Experiment Dataset Accuracy
1 93%
2 91%
3 90%
4 93%
5 93%
6 91%
7 94%
8 93%
9 91%
10 90%
Average	Accuracy 92%
Orange	Box	:	k-subset	(data	tes+ng)
K-Cross Validation
The	Future	ML	Trends
artificial neural network
convolutional neural network
deep learning
Social	Media	Analytics
WorkNlow
Application
Programming
Interface
(API)
Crawling
Process
> Network Structure
(Social Network Analysis)
> Content Analysis
(Text Analytics)
Pattern Mining and
Analytics Process
Social	Network	Analysis
Content	Analysis
Text	Network
First Topic
Identified
Topic	Modelling
•Topic modelling is a type of statistical modelling for discovering the abstract
“topics” that occur in a collection of documents..

•LDA (Latent Dirichlet Allocation) is the most popular (and typically most effective)
topic modelling technique
TOP BRAND ALTERNATIVE MEASUREMENT BASED ON
CONSUMER NETWORK ACTIVITY
Abstract:
In Business Intelligence effort, the legacy methodology to
measure product brand awareness use technique such as
surveys, interviews, and questionnaires. This methodology
requires expensive effort to collect data from respondent and
takes considerably time to accomplish. The availability of Big
Data in the form of social media interaction can benefit us.
The conversation and user generated content from social
media certainly can be used to measure brand awareness
through consumer activity. We use Social Network Analysis
methodology to measure the dynamic and evolution of brand
conversations in social media. By comparing the network
properties, we propose new alternative measurement
methods of product brand awareness. Our proposed
methodology is better adapted to large-scale conversational
data in social media.This measurement will also enhance the
current methodology by viewing consumer opinions as a
whole network and not as separated individual. This study
conducted via social networking conversations on Twitter using
two industry case studies, they are mobile operators and
mobile phone brands in Indonesia
mobile phone rank
mobile operator rank
A COMPARISON OF INDONESIA E-COMMERCE SENTIMENT ANALYSIS FOR
MARKETING INTELLIGENCE EFFORT
CASE STUDY : BUKALAPAK, TOKOPEDIA, ELEVENIA
Abstract:The rapid growth of e-commerce market in Indonesia, making various e-commerce companies
appear and there has been high competition among them. Marketing intelligence is important activity to
measure competitive position. One element of marketing intelligence is to assess customer satisfaction.
Many Indonesian customers express their sense of satisfaction or dissatisfaction towards the company
through social media. Hence, using social media data, it provides a new practical way to measure
marketing intelligent effort.This research performs sentiment analysis using naive bayes classifier
classification method withTF-IDF weighting.We compare the sentiments towards of top-3 e-commerce
sites visited companies, they are Bukalapak,Tokopedia and Elevenia.We useTwitter data for sentiment
analysis because it's faster, cheaper and easier from both the customer and the researcher side.The
purpose of this research is to find out how to process the huge customer sentimentTwitter to become
useful information for the e-commerce company, and which of those top-3 e-commerce companies has
the highest level of customer satisfaction. From the experiment results, it shows the method can be used
to classify customer sentiments in social mediaTwitter automatically and Elevenia is the highest e-
commerce with customer satisfaction
COMPARABLE	RESULT	
AMONG	THREE	CASE	STUDY
NETWORK TEXT ANALYSIS TO SUMMARISE ONLINE CONVERSATIONS FOR
MARKETING INTELLIGENCE EFFORTS IN TELECOMMUNICATION
INDUSTRY
Abstract - Market tight competition put pressure the companies to employ a new and faster way to support their
marketing intelligence effort.The need of marketing intelligence includes gathering and analysing data for confident
decision making about market and its competition.Today, the abundant large scale data from online social network
services has made possible to extract valuable information such as user opinions and sentiment from the
conversations in the market.As the competition arise, new challenge emerged, which include faster data
summarisation.The common practice of summarise contents is using wordcloud or weighted list of appearance words.
This approach is lack of sense and contextual relations between words in questions, because the words has no
connection with other words that might construct an important phrase.With the help of graph formulation, we
propose a methodology of network text analysis to summarise large conversation in online social network services.
This proposed methodology capture complex relations between words, while still maintain fast summarisation. In this
paper, we compare three major telecommunication provider in Indonesia, which is Telkomsel, XL and Indosat.The
conversations about those brands in online social network services Twitter is collected, Network text about each
brands are constructed and analysed.
NETWORK MARKET ANALYSIS USING LARGE SCALE SOCIAL
NETWORK CONVERSATION OF INDONESIA FAST FOOD INDUSTRY
Abstract - The high competitiveness of the Indonesia Fast Food market has forced the industry to find the new way to understand market behaviour. The new challenge
should include faster data collection and analytical process, preferably time delivery needed close to real-time. The common practice of gathering market data using
questionnaires and interviews are considered expensive and time-consuming process compared to mining online conversation with brand community respected. With the
availability of large-scale data from online social network services (oSNS), we can extract valuable information represent dynamic behaviour of the market. Many brands have
their presence in oSNS as a part of their customer relationship management (CRM) effort. The social interactions formed in oSNS can be modeled using Social Network
Analysis (SNA) methodology. In this paper, we compare two brand communities of head to head competitive product in the fast food industry, they are McDonald’s and Burger
King. The SNA model constructs large-scale network, its size, reaching close to a million of nodes and edges. The result will give us insight about what is important in
understanding the dynamic market beside the market size represented by the community conversations.
SOCIAL NETWORK AND SENTIMENT ANALYSIS FOR SOCIAL CUSTOMER
RELATIONSHIP MANAGEMENT IN INDONESIA BANKING SECTOR
	 	 	
SCRM Network
BCA BNI MANDIRI
Abstract - The increasing number of social media users affects both individual and corporation user. Banking sector, for example, use social media to support
their Social Customer Relationship Management activity. We investigate the dynamics and evolution of conversation network between bank customer using
Social Network Analysis methodology. Measurement is conducted by calculating its network properties to see the characteristic and how active the network is.
Customers talking about banks’ services can also express their opinion on social media. Therefore we perform sentiment analysis to classify customer’s opinion
into positive, negative and neutral class. This research was performed on Twitter’s conversation about Bank Mandiri, Bank Central Asia (BCA) and Bank
Negara Indonesia (BNI). The result of this research is beneficial for business intelligence purpose to support decision making.
MEASURING MARKETING COMMUNICATIONS MIX EFFORT USING
MAGNITUDE OF INFLUENCE AND INFLUENCE RANK METRIC
Abstract:	In	the	context	of	modern	marke:ng,	Twi>er	is	considered	as	a	communica:on	pla@orm	to	spread	informa:on.	Many	companies	create	and	acquire	several	Twi>er	
accounts	to	support	and	perform	varie:es	of	marke:ng	mix	ac:vi:es.	Ini:ally,	each	accounts	used	to	capture	specific	market	profile.	Together,	the	accounts	create	network	of	
informa:on	that	provide	consumer	to	the	informa:on	they	need	depends	on	their	contextual	u:lisa:on.	From	many	accounts	available,	we	have	the	fundamental	ques:on	on	
how	to	measure	influence	of	each	account	in	the	market	based	not	only	their	rela:ons,	but	also	the	effects	of	their	pos:ngs.	Magnitude	of	Influence	(MOI)	metric	is	adapted	
together	with	Influence	Rank	(IR)	measurement	of	accounts	in	their	social	network	neighbourhood.	We	use	social	network	analysis	approach	to	analyse	65	accounts	in	the	social	
network	of	an	Indonesian	mobile	phone	network	operator,	Telkomsel	which	involved	in	marke:ng	communica:ons	mix	ac:vi:es	through	series	of	related	tweets.	Using	social	
network	provide	the	idea	of	the	ac:vity	in	building	and	maintaining	rela:onships	with	the	target	audience.	This	paper	shows	the	results	of	the	most	poten:al	accounts	based	on	
the	network	structure	and	engagement.	Based	on	this	research,	the	more	number	of	followers	one	account	has,	the	more	responsibility	it	has	to	generate	the	interac:on	from	
their	followers	in	order	to	achieve	the	expected	effec:veness.	The	focus	of	this	paper	is	to	determine	the	most	poten:al	accounts	in	the	applica:on	of	marke:ng	communica:ons	
mix	in	Twi>er.
ratio of affection
magnitude of influence
LCRT function
influence rank (based on pagerank)
MAPPING ONLINE TRANSPORTATION SERVICE QUALITY AND MULTI-CLASS
CLASSIFICATION PROBLEM SOLVING PRIORITIES
CASE STUDY : GOJEK AND GRAB
Abstract. Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have
advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the
relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must
maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer
opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online
transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on
Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which
model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2%
positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall
result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end
users.
HYBRID SENTIMENT AND NETWORK ANALYSIS OF SOCIAL
OPINION POLARIZATION
Abstract:	The	rapid	growth	of	social	media	and	user	generated	contents	(UGC)	
has	provided	a	rich	source	of	poten:ally	relevant	data.	The	problems	arise	on	
how	to	summarise	those	data	to	understand	and	transforming	it	into	
informa:on.	Twi>er	as	one	of	the	most	popular	social	networking	and	micro-
blogging	service	can	be	analysed	in	terms	of	content	produced	with	sen:ment	
analysis.	On	the	other	hand,	some	types	of	networks	can	also	be	constructed	to	
analyse	the	social	network	structure	and	network	proper:es.	This	research	
intended	to	combine	those	content	and	structural	approaches	into	hybrid	
approach	for	iden:fies	social	opinion	polarisa:on,	this	is	in	the	form	of	
conversa:on	network.	Sen:ment	analysis	used	to	determine	public	sen:ment,	
and	social	network	analysis	used	to	analyse	the	structure	of	the	network,	
detec:ng	communi:es	and	influen:al	actors	in	the	network.	Using	this	hybrid	
approach,	we	have	comprehensive	understanding	about	social	opinion	
polarisa:on.	As	case	study,	we	present	real	social	opinion	polarisa:on	about	
reclama:on	issue	in	Indonesia.
DYNAMIC LARGE SCALE DATA ON TWITTER USING
SENTIMENT ANALYSIS AND TOPIC MODELLING
Case Study: Uber
Digital flows now exert a larger impact, the world is now more connected than
ever, the amount of cross-border bandwidth that used has grown 45 times larger
since 2005. With the massive amount of data spreading in the net, including
social media, speed is one most essential factor in business. companies can
take advantage of social media as a source to analyse and extract the
customer’s opinion, and therefore the company can have quick response
towards the condition.
The main purpose of this research is content analysis, to obtain the goal, we
need to extract the information as well as summarise the topic inside it.
However, in order to analyse the content quickly, there are varies choice of tools
with its specific output that creates challenges in the process. We use Naïve
Bayes Sentiment Analysis based on time-series, specifically on daily basis and
topic modeling based on Latent Dirichlet Allocation (LDA) to evaluate the
sentiment of the topic as well as the model of the topics discussed.
The purpose of this research is to help both companies and individuals to map
the public opinion towards certain topic by analyzing the sentiment of the text
and create a topic model. Therefore, a real-time information for determining the
consumer opinion become a crucial part. Twitter can serve the purpose as one
source of real-time information from user-generated content. We pick Uber as
the case study, viewed as one of the most favored transportation methods in
most part of the world. Data collection period is from 10th February 2017 until
28th February 2017 with 1.048.576 tweets collected.
ANALYSING EMPLOYEE VOICE USING REAL-TIME FEEDBACK
Abstract People nowadays tend to use social media as a platform to share their
reviews, emotions, and opinions, including about their jobs. Thus, a lot of data is
available on the web. Therefore, a rapid response is needed to analyse and interpret
the data. Unfortunately, many organisations still use annual surveys to assess
satisfaction, engagement, and culture in the workplace. Compared to other
conventional datasets such as company survey and questionnaire, decision-makers
could make decision effectively and efficiently by using the interpreted data. This
may be done with the help of sentiment analysis method.
In this research, we classify the feedback based on its category and sentiment.
Several classification algorithms are used in opinion mining, two of them are Naive
Bayes Classifier (NBC) and Support Vector Machine (SVM). This paper aims to
classify feedback based on sentiments using NBC and SVM.
*ICST, 2018
MONTE CARLO SIMULATION AND CLUSTERING FOR
CUSTOMER SEGMENTATION IN BUSINESS ORGANISATION
Abstract:	U:lising	data	for	segmenta:on	analysis	can	bring	a	streamlined	way	to	get	poten:al	insight	as	of	decision	making	support	in	a	business	organisa:on.	Using	
appropriate	data	analy:cal	technique	help	the	organisa:ons	in	profiling	their	customer	segments	accurately.	The	result	brings	an	effec:ve	marke:ng	strategy.	However,	there	
are	:mes	in	doing	data	analy:c,	the	organisa:on	needs	another	variable	of	data	where	the	value	is	unavailable,	for	example:	customer’s	income	data	which	mostly	hard	to	
collect.	By	using	Monte	Carlo	simula:on,	the	value	of	customer’s	income	can	be	generated	and	then	compared	with	customer	spending	to	construct	customer	segmenta:on	
model.	An	unsupervised	learning	for	customer	segmenta:on	model	using	K-Means	clustering	enables	us	to	see	the	grouping	pa>erns	of	customer’s	income	towards	their	
spending.	Clusters	of	the	dataset	might	be	interpreted	as	a	group	of	customers	that	having	a	similar	character.	This	paper	shows	us	how	to	generate	customer’s	income	data	
and	create	data	cluster	to	op:mising	customer	poten:al	by	u:lising	data.	Furthermore,	the	result	brings	us	insight	into	which	group	off	the	customer	might	unserved	properly	
considering	their	average	income	with	their	spending	behaviour.
MAPPING ORGANISATION KNOWLEDGE NETWORK AND
SOCIAL MEDIA BASED REPUTATION MANAGEMENT
Abstract—Knowledge management and reputation are important aspects in an
organization, especially in ICT industry. Controlling knowledge management and
modeling personal reputation through social media is essentials for the organization
because we can see how employee build their relationship around their peer
networks or clients virtually and how knowledge network can support organization
performance. The purpose of this research is to map knowledge network and
reputation formulation in order to fully understand how knowledge flow in an
organization and whether employee reputation have higher degree of influence in
organization knowledge network. We particularly develop formulas to measure
knowledge network and personal reputation based on their social media activities.
As case study, we pick an Indonesian ICT company which actively build their
business around their employee peer knowledge outside the company. For
knowledge network, we perform data collection by conducting interviews. For
reputation management, we crawl data from several popular social media. We base
our work on Social Network Analysis methodology. The result shows that employees
knowledge is directly proportional with their reputation, but there are different
reputations level on different social media observed in this research.
reputation formula for twitter, instagram and linkedin
PREDICTION MODELS BASED ON FLIGHT TICKETS AND HOTEL ROOMS DATA
SALES FOR RECOMMENDATION SYSTEM IN ONLINE TRAVELAGENT BUSINESS
Abstract - Indonesia as one of the favorite vacation destinations of domestic and foreign travelers made the value of investment in the tourism industry continued to
grow significantly. This was created more Online Travel Agent business in recent years. However, it made a lot of business travel and Umrah travel in Indonesia is
threatened with bankruptcy, after the online travel business activity is rampant in conventional business market ticket sales and travel tours. The research case
study is different from the Online Travel Agent business in general, because it worked in real-time analytic using flight tickets and hotel rooms sales data to create
prediction or recommendation model. Data mining, extraction of hidden predictive information from large databases, was a powerful technique with great potential
to help companies focus on the most important information in their data warehouse. By using classification method in data mining, the objectives of this paper is to
create predictive models from flight tickets and hotel rooms sales data using the decision tree classification approach. The result of this paper is beneficial for
business that can be used as basic algorithm for programming in Online Travel Agent recommendation feature.
EFFECTIVE KNOWLEDGE MANAGEMENT USING
BIG DATAAND SOCIAL NETWORK ANALYSIS
Vizualisa+on	of	hierarchical	structure	organiza+on	and	knowledge	
flow	of	informal	organiza+on
Abstract:		Knowledge	management	consists	of	iden+fying,	crea+ng,	represen+ng,	distribu+ng,	and	
enabling	adop+on	of	insights	and	experiences	in	an	organiza+on.	One	approach	of	modeling	knowledge	
management	is	using	network	model.	Big	Data	is	one	of	important	ICT	technological	roadmap,	which	
main	func+on	is	modelling	behaviour	and	helping	organiza+on	decision	support.	Social	Network	
Analysis	is	a	micro	version	of	Big	Data	where	we	can	model	and	establish	social	network	quan+fica+on.	
In	this	paper	we	will	show	how	Social	Network	Analysis	can	help	organiza+on	applying	Knowledge	
Management	strategies	and	prac+ces	by	experiment	using	real-world	large	dataset	contains	360000+	
email	exchanges	between	36000+	employees	inside	in	an	organiza+on
business case resolved using SNA methodology
map	of	full	network	emaile	xchange	between	employes	in	Enron
INDONESIA INFRASTRUCTURE AND CONSUMER STOCK PORTFOLIO
PREDICTION USING ARTIFICIAL NEURAL NETWORK BACKPROPAGATION
*ICOICT, 2017
Abstract:		Ar:ficial	Neural	Network	(ANN)	method	is	increasingly	popular	to	build	predic:ve	
model	that	generated	small	error	predic:on.	To	have	a	good	model,	ANN	needs	large	dataset	as	an	
input.	ANN	backpropaga:on	is	a	gradient	decrease	method	to	minimize	the	output	error	squared.	
Stock	price	movements	are	suitable	with	ANN	requirement	:	it	is	a	large	data	set	because	stock	price	
is	recorded	up	to	every	seconds,	usually	called	high	frequency	data.	The	implementa:onof	stock	
price	predic:on	using	ANN	approach	is	quite	new.	The	predic:ve	model	help	investor	in	building	
stock	por@olio	and	their	decision	making	process.	Buying	some	stocks	in	por@olio	decrease	
diversified	risk	and	increases	the	chance	of	higherreturn.In	this	paper,	we	show	how	to	generate	
predic:on	model	using	ar:ficial	neural	network	backpropaga:on	of	stock	price	and	forming	
por@olio	with	predicted	price	that	bring	predic:on	of	the	por@olio	with	the	smallest	error.	The	data	
set	we	use	is	historical	stock	price	data	from	ten	different	company	stocks	of	infrastructure	and	
consumer	sector	Indonesia	Stock	Exchage.	The	results	is	for	lower	risk	condi:on,	ANN	predic:ve	
model	gives	higher	expected	return	than	the	return	from	real	condi:on,	while	for	higher	risk,	the	
return	from	the	real	condi:on	is	higherthan	the	ANN	predic:ve	model.
THE DYNAMIC OF BANKING NETWORK TOPOLOGY
Case Study: Indonesian Presidential Election Event
ABSTRACT - Information and communication technologies have brought major changes in data storage and processing. Various types and high volume of
data has been digitalised and support mining-based data processing to provide knowledge in a modern and efficient way. Banking transaction data has been
stored digitally and suitable for the mining process especially in network science model.Understanding transaction system risk requires fundamental study on
payments flow and bank behaviour in various situations. Lehman Brother’s failure spread contagion impact in a short time indicates that financial markets
have interdependent properties and connected to each other in a large network. Thus, overall system network approach becomes more important than a single
bank. Political conditions greatly affect economic stability including the banking and financial sectors. Presidential election is a major political event for a
nation. This affected on community sentiment and financial market. However, the linkage between political events and topological changes is poorly
understood.This research presents an insight of the event driven dynamic network topology with banking transaction as a case study. We search for the
banking transaction network topology dynamic driven by 2014 Indonesian presidential election event. We discover that banks are more engaged to others in
larger value 3 days before the end of campaign period and less engaged to others in smaller value in the end of campaign period. Unique transaction activity
between banks remain stable with low declination in the end of campaign period. This scenario provides the possibility to learn the banking transaction
pattern and support the financial system stability supervision.
A COMPARATIVE STUDY OF EMPLOYEE CHURN PREDICTION
MODEL
Abstract - Churn phenomenon commonly occurs in customer loyalty towards
brand product or services. They becomes critical issue that any industry
would make best effort to avoid. Churn problem may arise within the
organisation, called employee churn. Employee churn creates myriad and
adverse effects to the organisation as it correlates with unfairly workload
distribution, great deal of money lost and also extra time needed to find a
replace, which may result in the rise of customer dissatisfaction rate. The
purpose of this study is to find the best model to predict employee churn. A
successful prediction model for employee churn is significantly needed in
order to avert various negative impacts for the organisation. There are three
popular classification models for prediction, namely naïve bayes, decision
tree, and random forest. This study compares performance of the
aforementioned models by using Human Resource Information System
(HRIS) from one of Indonesia’s renowned telecommunication company. The
data collected for the study spans for 2 years period, started from 2015 until
2017. The findings from the study suggest that the best classification model is
random forest due to its immense accuracy of 97.5%. The second-best
method is naïve bayes with 96.6%, and the lowest accuracy of classification
model is decision tree with 88.7%. The study concludes that the most reliable
and accurate classification model to predict employee churn is random forest
Conclusion
STATISTICS DATA ANALYTICS
Confirmative Explorative
Small Data Set Larga Data Set
Small Number of Variable Large Number of Variable
Deductive (no predictions) Inductive
Numeric Data Numeric and Non-Numeric Data
Clean Data Data Cleaning
Complimentary	Methods
• big	data	provide	granular,	micro	data	
• big	data	provide	relatively	fast	and	cheap	process	
• research	opportunity	on	data	science	methods,	implementation	and	evaluation	
maturity	
• data	scientist	helps	big	data	initiatives	towards	future	and	sustainable	economic	
activities	
• uncovering	hidden	truths,	democratisation	by	data,	are	primary	objective	of	data	
scientist
• hard	to	Nind	data	scientist	talent	
• high	cost	to	maintain	data	scientist	talent	..	
• big	data	often	populations	study,	so	no	sampling	error	=>	methods	familiarity	
• beneNit		>	data	quality	+	costs	+	security		
• ML	result	credibility	(different	algorithm,	different	conclusion)
CHALLENGES
Opportunities
Challenges
Data	Visualisation
The	Power	of	Data	is	…
every	breath	you	take	
every	move	you	make	
every	bond	you	break	
every	step	you	take	
l’ll	be	watching	you
without Big Data, you are blind
and deaf in the middle of a freeway
- Geoffrey Moore -
Understanding Big Data Analytics for Research Activity

Contenu connexe

Tendances

Framing Trust in Medical AI: Seminar EurAI ACAI
Framing Trust in Medical AI: Seminar EurAI ACAIFraming Trust in Medical AI: Seminar EurAI ACAI
Framing Trust in Medical AI: Seminar EurAI ACAIJose M. Juarez
 
Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health CareLeo Barella
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can Do
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can DoAI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can Do
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can DoBohyun Kim
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in MedicineNasir Arafat
 
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...Amazon Web Services
 
Applying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingApplying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingData Con LA
 
Artificial Intelligence (A I)
Artificial Intelligence (A I)Artificial Intelligence (A I)
Artificial Intelligence (A I)NaveenXavier7
 
Fraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityFraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityNeo4j
 
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaTop 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaEdureka!
 
Into the Future with Artificial Intelligence: Opportunities and Challenges
Into the Future with Artificial Intelligence: Opportunities and ChallengesInto the Future with Artificial Intelligence: Opportunities and Challenges
Into the Future with Artificial Intelligence: Opportunities and ChallengesRobin Teigland
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to MeasureHealth Catalyst
 
Business Intelligence and Business Analytics
Business Intelligence and Business AnalyticsBusiness Intelligence and Business Analytics
Business Intelligence and Business Analyticssnehal_152
 

Tendances (20)

Data Analytics Training Course
Data Analytics Training CourseData Analytics Training Course
Data Analytics Training Course
 
Framing Trust in Medical AI: Seminar EurAI ACAI
Framing Trust in Medical AI: Seminar EurAI ACAIFraming Trust in Medical AI: Seminar EurAI ACAI
Framing Trust in Medical AI: Seminar EurAI ACAI
 
Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health Care
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can Do
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can DoAI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can Do
AI Lab at a Library? Why Artificial Intelligence Matters & What Libraries Can Do
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in Medicine
 
Big data
Big dataBig data
Big data
 
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...
A Data Driven Roadmap to Enterprise AI Strategy (Sponsored by Contino) - AWS ...
 
Applying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingApplying Data Science and Analytics in Marketing
Applying Data Science and Analytics in Marketing
 
data mining
data mining data mining
data mining
 
Artificial Intelligence (A I)
Artificial Intelligence (A I)Artificial Intelligence (A I)
Artificial Intelligence (A I)
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Fraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityFraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business Authority
 
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaTop 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
 
Into the Future with Artificial Intelligence: Opportunities and Challenges
Into the Future with Artificial Intelligence: Opportunities and ChallengesInto the Future with Artificial Intelligence: Opportunities and Challenges
Into the Future with Artificial Intelligence: Opportunities and Challenges
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
 
Business Intelligence and Business Analytics
Business Intelligence and Business AnalyticsBusiness Intelligence and Business Analytics
Business Intelligence and Business Analytics
 

Similaire à Understanding Big Data Analytics for Research Activity

Peran Generasi Milenial di Era 4.0
Peran Generasi Milenial di Era 4.0Peran Generasi Milenial di Era 4.0
Peran Generasi Milenial di Era 4.0Andry Alamsyah
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxMadhumitha N
 
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...Ethical Consultant Services
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Fundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfFundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfBPBOnline
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning pyingkodi maran
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfmustaq4
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Data Science & Application.pdf
Data Science & Application.pdfData Science & Application.pdf
Data Science & Application.pdfGyanaranjanSahoo62
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
 
Artificial Intelligence in Cyber Security Research Paper Writing.pptx
Artificial Intelligence in Cyber Security Research Paper Writing.pptxArtificial Intelligence in Cyber Security Research Paper Writing.pptx
Artificial Intelligence in Cyber Security Research Paper Writing.pptxkellysmith617941
 
Introduction to AI and its domains.pptx
Introduction to AI and its domains.pptxIntroduction to AI and its domains.pptx
Introduction to AI and its domains.pptxNeeru Mittal
 
Understanding-Artificial-Intelligence-in-Research (1).pptx
Understanding-Artificial-Intelligence-in-Research (1).pptxUnderstanding-Artificial-Intelligence-in-Research (1).pptx
Understanding-Artificial-Intelligence-in-Research (1).pptxForum of Blended Learning
 

Similaire à Understanding Big Data Analytics for Research Activity (20)

Peran Generasi Milenial di Era 4.0
Peran Generasi Milenial di Era 4.0Peran Generasi Milenial di Era 4.0
Peran Generasi Milenial di Era 4.0
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
 
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
 
DATAIA & TransAlgo
DATAIA & TransAlgoDATAIA & TransAlgo
DATAIA & TransAlgo
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Fundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfFundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdf
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
Data Science - NXT Level_Dr.Arun.pdf
Data Science - NXT Level_Dr.Arun.pdfData Science - NXT Level_Dr.Arun.pdf
Data Science - NXT Level_Dr.Arun.pdf
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
data science
data sciencedata science
data science
 
data science
data sciencedata science
data science
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Data Science & Application.pdf
Data Science & Application.pdfData Science & Application.pdf
Data Science & Application.pdf
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Data
 
Artificial Intelligence in Cyber Security Research Paper Writing.pptx
Artificial Intelligence in Cyber Security Research Paper Writing.pptxArtificial Intelligence in Cyber Security Research Paper Writing.pptx
Artificial Intelligence in Cyber Security Research Paper Writing.pptx
 
Introduction to AI and its domains.pptx
Introduction to AI and its domains.pptxIntroduction to AI and its domains.pptx
Introduction to AI and its domains.pptx
 
Understanding-Artificial-Intelligence-in-Research (1).pptx
Understanding-Artificial-Intelligence-in-Research (1).pptxUnderstanding-Artificial-Intelligence-in-Research (1).pptx
Understanding-Artificial-Intelligence-in-Research (1).pptx
 

Plus de Andry Alamsyah

Central Bank Digital Currency (CBDC): Best Practice and Technical Considerations
Central Bank Digital Currency (CBDC): Best Practice and Technical ConsiderationsCentral Bank Digital Currency (CBDC): Best Practice and Technical Considerations
Central Bank Digital Currency (CBDC): Best Practice and Technical ConsiderationsAndry Alamsyah
 
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...Andry Alamsyah
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
 
Finding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisFinding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisAndry Alamsyah
 
Ontology Modelling Approach for Personality Measurement based on Social Media...
Ontology Modelling Approach for Personality Measurement based on Social Media...Ontology Modelling Approach for Personality Measurement based on Social Media...
Ontology Modelling Approach for Personality Measurement based on Social Media...Andry Alamsyah
 
Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...Andry Alamsyah
 
Hybrid sentiment and network analysis of social opinion polarization icoict
Hybrid sentiment and network analysis of social opinion polarization   icoictHybrid sentiment and network analysis of social opinion polarization   icoict
Hybrid sentiment and network analysis of social opinion polarization icoictAndry Alamsyah
 
Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Andry Alamsyah
 
Understanding new digital economy
Understanding new digital economyUnderstanding new digital economy
Understanding new digital economyAndry Alamsyah
 
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...Andry Alamsyah
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Social Network, Metrics and Computational Problem
Social Network, Metrics and Computational ProblemSocial Network, Metrics and Computational Problem
Social Network, Metrics and Computational ProblemAndry Alamsyah
 
Jejaring Sosial untuk Peneliti dan Litbang
Jejaring Sosial untuk Peneliti dan LitbangJejaring Sosial untuk Peneliti dan Litbang
Jejaring Sosial untuk Peneliti dan LitbangAndry Alamsyah
 
Social network for academics
Social network for academicsSocial network for academics
Social network for academicsAndry Alamsyah
 
Data Mining vs Statistics
Data Mining vs StatisticsData Mining vs Statistics
Data Mining vs StatisticsAndry Alamsyah
 

Plus de Andry Alamsyah (19)

ChatGPT for Academic
ChatGPT for AcademicChatGPT for Academic
ChatGPT for Academic
 
Central Bank Digital Currency (CBDC): Best Practice and Technical Considerations
Central Bank Digital Currency (CBDC): Best Practice and Technical ConsiderationsCentral Bank Digital Currency (CBDC): Best Practice and Technical Considerations
Central Bank Digital Currency (CBDC): Best Practice and Technical Considerations
 
Education 4.0
Education 4.0 Education 4.0
Education 4.0
 
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...
Artificial Neural Network for Predicting Indonesia Stock Exchange Composite u...
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
 
Finding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisFinding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network Analysis
 
Ontology Modelling Approach for Personality Measurement based on Social Media...
Ontology Modelling Approach for Personality Measurement based on Social Media...Ontology Modelling Approach for Personality Measurement based on Social Media...
Ontology Modelling Approach for Personality Measurement based on Social Media...
 
Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...
 
Hybrid sentiment and network analysis of social opinion polarization icoict
Hybrid sentiment and network analysis of social opinion polarization   icoictHybrid sentiment and network analysis of social opinion polarization   icoict
Hybrid sentiment and network analysis of social opinion polarization icoict
 
Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)
 
Understanding new digital economy
Understanding new digital economyUnderstanding new digital economy
Understanding new digital economy
 
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Social Network, Metrics and Computational Problem
Social Network, Metrics and Computational ProblemSocial Network, Metrics and Computational Problem
Social Network, Metrics and Computational Problem
 
Jejaring Sosial untuk Peneliti dan Litbang
Jejaring Sosial untuk Peneliti dan LitbangJejaring Sosial untuk Peneliti dan Litbang
Jejaring Sosial untuk Peneliti dan Litbang
 
Social network for academics
Social network for academicsSocial network for academics
Social network for academics
 
Data Mining vs Statistics
Data Mining vs StatisticsData Mining vs Statistics
Data Mining vs Statistics
 
Content era
Content eraContent era
Content era
 
Komputer grafik
Komputer grafikKomputer grafik
Komputer grafik
 

Dernier

GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdfChris Skinner
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifeBhavana Pujan Kendra
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdfMintel Group
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerAggregage
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersPeter Horsten
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers referencessuser2c065e
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsIndiaMART InterMESH Limited
 
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh JiPsychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh Jiastral oracle
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverseSiemens
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfJamesConcepcion7
 

Dernier (20)

GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in Life
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon Harmer
 
WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exporters
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers reference
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan Dynamics
 
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh JiPsychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverse
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdf
 

Understanding Big Data Analytics for Research Activity

  • 1. BIG DATA ANALYTICS UNDERSTANDING FOR RESEARCH ACTIVITY Dr. Andry Alamsyah Asosiasi Ilmuwan Data Indonesia School of Economics and Business,Telkom University
  • 2. Research Field : Social Computing, Social Network, Complex Network / Network Science, Computational Social Science, Data Analytics, Big Data, Data Mining, Graph Theory, Disruptive Innovation / Disruptive Economy, ICT Entrepreneurial Business, Data / Information Business Andry Alamsyah • Researcher / Data Scientist • Director of Digital Business Ecosystem Research Centre • Chief and Founder of Lab. Social Computing & Big Data • Chairman & Founder Indonesian Data Scientist Society (AIDI) email andry.alamsyah@gmail.com blog andrya.staff.telkomuniversity.ac.id repository telkomuniversity.academia.edu/andryalamsyah repository researchgate.net/profile/Andry_Alamsyah linkedin linkedin.com/andry.alamsyah twitter twitter.com/andrybrew Education : S1 : Mathematics - ITB, Topic: Statistics S2 : Informatics - UPJV, France, Topic: Information System, and Multimedia S3 : Electro and Informatics - ITB, Topic: Social Network, and Big Data Links : Introduction
  • 3. • Background and Motivation • Big Data DeNinition and Related Field • Understanding Pattern • Data Analytics / Machine Learning Fundamental (Prediction and Recommendation) • Social Media Analytics (by Case Study) • Conclusion • Working on Your Computer (Machine Learning Practice) Agenda
  • 9. • Industry 4.0 -> cyber physical system -> enabling human to produce large- scale data -> human behaviour quantiNication • Key Technologies : data, computational power and connectivity; analytics and intelligence; human machine interaction; advanced production methods the environment Deloitte, Industry 4.0 Industry 4.0
  • 11. Cheap Change Everything efficient economy new value proposition • cutting through the BIG DATA hype • cheap means everywhere • cheap creates value • from cheap to strategy complex human behaviour market uncertainty business sustainability disruptive economic coopetitive, cooperative, competitive business ecosystem / platform programmable economy event driven API economy toward large-scale and massive socio-economic impact Industry 4.0
  • 13. Big Data DeNinition •a term => describe extremely large amounts of structured and unstructured data •the activity => capture / storage / processing / sharing / reporting of data => beyond the ability of legacy software tools and hardware infrastructure •related to many “science” branch => data analytics, data science, machine learning,  artificial intelligence, IoT, and many more •the application => on many field => efficient, cost-effective, faster & accurate decision making Gigabyte 109 = 1.000.000.000 Terabyte 1012 = 1.000.000.000.000 Petabyte Exabyte 1015 = 1.000.000.000.000.000 Exabyte 1018 = 1.000.000.000.000.000.000 Zetabyte 1021 = 1.000.000.000.000.000.000.000 1990 2010 Hadoop store 1400 MB store 1TB 100 drives working at the same time can read 1TB data in 2 minutes transfer speed 4.5 MB/s transfer speed 100 MB/s read drive ~ 5 minutes read drive ~ 3 hours
  • 15. DATA ANALYTICS -the discovery, interpretation, and communication of meaningful patterns in data (wikipedia) -the process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make more informed business decision SOURCE review, opinion, historical data, conversation, network friendship, CCTV, Vlog, location tagging, etc BIG DATA large, fast, complex the 5V’s data DATA SCIENCE the science to extract knowledge / pattern from data SOCIAL COMPUTING quantification of human / social behaviour INSIGHT market segmentation, risk analytics information dissemination, recommended investment, fraud detection, personalised adv, customer acquisition and retention, purchase behaviour, early detection event, brand awareness, etc opportunity activity methodology benefit application Big Data Related Terms (Use Case)
  • 16. Data Analytics • The discovery, interpretation, and communication of meaningful patterns in data (wikipedia) • The process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make more informed business decision predictive, descriptive, diagnostic, prescriptive.
  • 17. Predictive Analytics • study the past if you want to study the future (confucius) • Predictive Analytics is the art of building and using models that make predictions based on patterns extracted from historical data. Predictive analytics applications include: price predictions, dosage predictions, risk assesment, propensity/likelihood modelling, diagnosis, document classifications • Predictive is the assignment of a value to any unknown variable. • A model is trained to make predictions based on a set of historical examples. (we use Machine Learning)
  • 18. Data Science Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.
  • 21. CRISP-DM CRISP-DM -> Cross -Industry Standard Process for Data Mining is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model.[2]
  • 22. Structure Data Type Column Value Pa+ent Andry Alamsyah Date of Birth 12/07/1995 Date Admi?ed 02/03/2019 “The patient came in complaining of chest pain, shortness of breath, and lingering headaches.. Smokes 2 packs a day.. Family history of heart disease.. Has been experiencing similar symptoms for the past 12 hours…” High Degree of Organiza+on, such as a rela+onal database Informa+on that is difficult to organise using tradi+onal mechanisms VS Structured Unstructured
  • 32. How Can (Big) Data Analytics Helps? by describing the phenomenon, by predicting the value, by estimating the future outcome, by optimising the resources and the decision, by simulating all the possible scenarios ..
  • 34. • Machine learning is defined as an automated process that extracts patterns from data to build the models used in predictive analytics applications. • A branch of artificial intelligence, concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data. Machine Learning
  • 35. Machine Learning Machine Learning is an idea to learn from examples and experience, without being explicitly programmed. Instead of writing code, we feed data to the generic algorithm, and it builds logic based on the data given. Computer Output Program Data • Traditional Programming Computer Program Output Data • Machine Learning
  • 36. Machine Learning Machine learning (ML) is the science of getting computers to act without being explicitly programmed. ML has given us self- driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. ML is pervasive today, we probably use it dozens of times a day without knowing it. It is the best way to make progress towards human-level AI. (standford/coursera) ML is a type of artiNicial intelligence (AI) that provides computers with the ability to l e a r n w i t h o u t b e i n g e x p l i c i t l y p r o g r a m m e d . M L f o c u s e s o n t h e development of computer programs that can teach themselves to grow and change when exposed to new data. (whatis.com)
  • 38. Machine Learning in Business Finance and Banking • Credit scoring • Fraud detection • Risk Analysis • Portfolio Optimization • Client Analysis • Trading Exchange Forecasting Retail and E-Commerce • Price Optimization • Recommendation • Predictive Inventory Planning • Fraud Detection • Customer Segmentation Manufacturing • Predictive Maintenance or Condition Monitoring. • Warranty reserve estimation • Demand forecasting • Process Optimization Marketing and Sales • Market and Customer Segmentation • Price Optimization • Customer Churn Analysis • Customer lifetime value prediction • Sentiment Analysis in Social Networks
  • 41. •It is based on a labeled training set. •The class of each piece of data in training set is known. •Class labels are pre-determined and provided in the training phase. Supervised Learning A B A B A B e Class l Class l Class l Class e Class e Class “What is the class of this data point?” Task performed : classification, pattern recognition
  • 43. Supervised Learning Problems : • ClassiNication • The domain of the target attribute is Ninite and categorical. • A classiNier must assign a class to a unseen example. • Regression • The target attribute is formed by inNinite values. • To Nit a model to learn the output target attribute as a function of input attributes. • Time Series Analysis • Making predictions in time.
  • 46. Unsupervised Learning •Input : set of patterns P, from n-dimensional space S, but little or no information about their classiNication, evaluation, interesting features, etc. It must learn these by itself! : ) •Tasks: - Clustering - Group patterns based on similarity - Vector Quantisation - Fully divide up S into a small set of regions (deNined by codebook vectors) that also helps cluster P. - Feature Extraction - Reduce dimensionality of S by removing unimportant features (i.e. those that do not help in clustering P) • There is no supervisor and only input data is available. • The aim is now to Nind regularities, irregularities, relationships, similarities and associations in the input.
  • 47. Unsupervised Learning Problems : • Clustering • Association Rules • Pattern Mining • It is adopted as a more general term than frequent pattern mining or association mining • Outlier Detection • It is the process of Ninding data which have very different behaviour from the expectation (outliers or anomalies)
  • 50. Background : • How to learn a new skill • Learning and intelligence • Interaction with environment • Goal-oriented learning • Agent – Environment interactions • Activities - What to do - How to map situations to actions - Process positive and negative rewards Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximise some notion of cumulative reward. Basic reinforcement is modeled as a Markov decision process, and are often stochastic process
  • 51. The Analogy • A child learns to walk • The child is an agent trying to manipulate the environment • The child is taking actions (state 1, state 2, state 3, and so on) • Positive rewards when able to walk • Negative rewards when not able to walk Reinforcement Learning
  • 52. Reinforcement Learning Various Practical applications of Reinforcement Learning • RL can be used in robotics for industrial automation. • RL can be used in machine learning and data processing • RL can be used to create training systems that provide custom instruction and materials according to the requirement of students.
  • 53. Data Preparation (CRISP-DM) Data Preprocessing • Measures for data quality: A multidimensional view • Accuracy: correct or wrong, accurate or not • Completeness: not recorded, unavailable, … • Consistency: some modified but some not, … • Timeliness: timely update? • Believability: how trustable the data are correct? • Interpretability: how easily the data can be understood?
  • 54. 1.Data Cleaning a. Fill in missing values b. Smooth noisy data c. Iden+fy or remove outliers d. Resolve inconsistencies 2.Data Reduc6on a. Dimensionality reduc+on b. Numerosity reduc+on c. Data compression 3.Data Transforma6on and Data Discre6sa6on a. Normalisa+on b. Concept hierarchy genera+on 4.Data integra6on a. Integra+on of mul+ple databases or files Data Preprocessing Task
  • 55. Common Data Analytics Rules Tasks Descrip6on Algorithms Examples Classification Predict if data points belongs to one of the predefined classes. Prediction based on learning from known dataset. Decision tree, neural network Bucketing new customers into one of the known customer groups Regression Predict the numeric target label of a data point. Prediction based on learning from known dataset. Linear regression, logistic regression Estimating insurance premium Clustering Identify natural clusters within the data set based on inherit properties within data set. K-Means, density based clustering Finding customer segments in a company based on transaction and call data. Association Rules Identify relationships within an item set based on transaction data FP-Growth algorithm, Apriori Find cross-selling opportunities for a retailer based on transaction purchase history Anomaly Detection Predict if a data point is an outlier compared to other data point in the dataset Distance based, density based, Local Outlier Factor (LOF) Fraud transaction detection in credit cards
  • 56. Estimation Customer Order Number of Traffic Light Distance Travel Time 1 3 3 3 16 2 1 7 4 20 3 2 4 6 18 4 4 6 8 36 ... 1000 2 4 2 12 Label Learning Model using Estimation Methods (Linear Regression) Travel Time = 0.48O + 0.23TL + 0.5D Knowledge Pizza Delivery Time
  • 57. Predictions stock price dataset in time series format label prediction using Neural Network Learning prediction plot
  • 58. ClassiNication NIM Gender Nilai UN Asal Sekolah IPS1 IPS2 IPS3 IPS 4 ... Lulus Tepat 10001 L 28 SMAN 2 3.3 3.6 2.89 2.9 Ya 10002 P 27 SMA DK 4.0 3.2 3.8 3.7 Tidak 10003 P 24 SMAN 1 2.7 3.4 4.0 3.5 Tidak 10004 L 26.4 SMAN 3 3.2 2.7 3.6 3.4 Ya ... ... 11000 L 23.4 SMAN 5 3.3 2.8 3.1 3.2 Ya label learning using C4.5 classification methods
  • 59. input : golf playing recommendation output (rules) : If outlook = sunny and humidity = high then play = no
 If outlook = rainy and windy = true then play = no
 If outlook = overcast then play = yes
 If humidity = normal then play = yes
 If none of the above then play = yes output (tree) : ClassiNication
  • 60. Clustering dataset without label learning using K-means clustering methods
  • 62. 1.Es+ma+on: - Linear Regression, Neural Network, Support Vector Machine, etc 2.Predic+on/Forecas+ng: - Linear Regression, Neural Network, Support Vector Ma chine, etc 3.Classifica+on: - Naive Bayes, K-Nearest Neighbor, C4.5, ID3, CART, Linear Discriminant Analysis, Logis+c Regression, etc 4.Clustering: - K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-Means, etc 5.Associa+on: - FP-Growth, A Priori, Coefficient of Correla+on, Chi Square, etc Algorithm in Data Analytics
  • 65. Probability-based prediction approaches are heavily based on Bayes’ Theorem Probability-Based Learning • A probabilistic framework for solving classiNication problems • Conditional Probability / Bayes Theorem )( )()|( )|( AP CPCAP ACP = • Given: • A doctor knows that meningitis causes stiff neck 50% of the time • Prior probability of any patient having meningitis is 1/50,000 • Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis? 0002.0 20/1 50000/15.0 )( )()|( )|( = ´ == SP MPMSP SMP
  • 67. Model Evaluation 1.Estimation: - Error: Root Mean Square Error (RMSE), MSE, MAPE, etc 2.Prediction/Forecasting - Error: Root Mean Square Error (RMSE) , MSE, MAPE, etc 3.ClassiNication: - Confusion Matrix: Accuracy - ROC Curve: Area Under Curve (AUC) 4.Clustering: - Internal Evaluation: Davies–Bouldin index, Dunn index, - External Evaluation: Rand measure, F-measure, Jaccard index, Fowlkes–Mallows index, Confusion matrix 5.Association: - Lift Charts: Lift Ratio - Precision and Recall (F-measure)
  • 68. learning and evaluation process confusion matrix PREDICTED CLASS ACTUAL
 CLASS Class=Yes Class=No Class=Yes a b Class=No c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative) FNFPTNTP TNTP dcba da +++ + = +++ + =Accuracy cba a pr rp ba a ca a ++ = + = + = + = 2 22 (F)measure-F (r)Recall (p)Precision Model Evaluation evaluation metric
  • 69. Model Evaluation • Learning curve shows how accuracy changes with varying sample size • Requires a sampling schedule for creating learning curve: - Arithmetic sampling - Geometric sampling • Effect of small sample size: - Bias in the estimate - Variance of estimate
  • 70. Increase Coverage Experiment Dataset Accuracy 1 93% 2 91% 3 90% 4 93% 5 93% 6 91% 7 94% 8 93% 9 91% 10 90% Average Accuracy 92% Orange Box : k-subset (data tes+ng) K-Cross Validation
  • 73. WorkNlow Application Programming Interface (API) Crawling Process > Network Structure (Social Network Analysis) > Content Analysis (Text Analytics) Pattern Mining and Analytics Process
  • 77. First Topic Identified Topic Modelling •Topic modelling is a type of statistical modelling for discovering the abstract “topics” that occur in a collection of documents.. •LDA (Latent Dirichlet Allocation) is the most popular (and typically most effective) topic modelling technique
  • 78. TOP BRAND ALTERNATIVE MEASUREMENT BASED ON CONSUMER NETWORK ACTIVITY Abstract: In Business Intelligence effort, the legacy methodology to measure product brand awareness use technique such as surveys, interviews, and questionnaires. This methodology requires expensive effort to collect data from respondent and takes considerably time to accomplish. The availability of Big Data in the form of social media interaction can benefit us. The conversation and user generated content from social media certainly can be used to measure brand awareness through consumer activity. We use Social Network Analysis methodology to measure the dynamic and evolution of brand conversations in social media. By comparing the network properties, we propose new alternative measurement methods of product brand awareness. Our proposed methodology is better adapted to large-scale conversational data in social media.This measurement will also enhance the current methodology by viewing consumer opinions as a whole network and not as separated individual. This study conducted via social networking conversations on Twitter using two industry case studies, they are mobile operators and mobile phone brands in Indonesia mobile phone rank mobile operator rank
  • 79. A COMPARISON OF INDONESIA E-COMMERCE SENTIMENT ANALYSIS FOR MARKETING INTELLIGENCE EFFORT CASE STUDY : BUKALAPAK, TOKOPEDIA, ELEVENIA Abstract:The rapid growth of e-commerce market in Indonesia, making various e-commerce companies appear and there has been high competition among them. Marketing intelligence is important activity to measure competitive position. One element of marketing intelligence is to assess customer satisfaction. Many Indonesian customers express their sense of satisfaction or dissatisfaction towards the company through social media. Hence, using social media data, it provides a new practical way to measure marketing intelligent effort.This research performs sentiment analysis using naive bayes classifier classification method withTF-IDF weighting.We compare the sentiments towards of top-3 e-commerce sites visited companies, they are Bukalapak,Tokopedia and Elevenia.We useTwitter data for sentiment analysis because it's faster, cheaper and easier from both the customer and the researcher side.The purpose of this research is to find out how to process the huge customer sentimentTwitter to become useful information for the e-commerce company, and which of those top-3 e-commerce companies has the highest level of customer satisfaction. From the experiment results, it shows the method can be used to classify customer sentiments in social mediaTwitter automatically and Elevenia is the highest e- commerce with customer satisfaction COMPARABLE RESULT AMONG THREE CASE STUDY
  • 80. NETWORK TEXT ANALYSIS TO SUMMARISE ONLINE CONVERSATIONS FOR MARKETING INTELLIGENCE EFFORTS IN TELECOMMUNICATION INDUSTRY Abstract - Market tight competition put pressure the companies to employ a new and faster way to support their marketing intelligence effort.The need of marketing intelligence includes gathering and analysing data for confident decision making about market and its competition.Today, the abundant large scale data from online social network services has made possible to extract valuable information such as user opinions and sentiment from the conversations in the market.As the competition arise, new challenge emerged, which include faster data summarisation.The common practice of summarise contents is using wordcloud or weighted list of appearance words. This approach is lack of sense and contextual relations between words in questions, because the words has no connection with other words that might construct an important phrase.With the help of graph formulation, we propose a methodology of network text analysis to summarise large conversation in online social network services. This proposed methodology capture complex relations between words, while still maintain fast summarisation. In this paper, we compare three major telecommunication provider in Indonesia, which is Telkomsel, XL and Indosat.The conversations about those brands in online social network services Twitter is collected, Network text about each brands are constructed and analysed.
  • 81. NETWORK MARKET ANALYSIS USING LARGE SCALE SOCIAL NETWORK CONVERSATION OF INDONESIA FAST FOOD INDUSTRY Abstract - The high competitiveness of the Indonesia Fast Food market has forced the industry to find the new way to understand market behaviour. The new challenge should include faster data collection and analytical process, preferably time delivery needed close to real-time. The common practice of gathering market data using questionnaires and interviews are considered expensive and time-consuming process compared to mining online conversation with brand community respected. With the availability of large-scale data from online social network services (oSNS), we can extract valuable information represent dynamic behaviour of the market. Many brands have their presence in oSNS as a part of their customer relationship management (CRM) effort. The social interactions formed in oSNS can be modeled using Social Network Analysis (SNA) methodology. In this paper, we compare two brand communities of head to head competitive product in the fast food industry, they are McDonald’s and Burger King. The SNA model constructs large-scale network, its size, reaching close to a million of nodes and edges. The result will give us insight about what is important in understanding the dynamic market beside the market size represented by the community conversations.
  • 82. SOCIAL NETWORK AND SENTIMENT ANALYSIS FOR SOCIAL CUSTOMER RELATIONSHIP MANAGEMENT IN INDONESIA BANKING SECTOR SCRM Network BCA BNI MANDIRI Abstract - The increasing number of social media users affects both individual and corporation user. Banking sector, for example, use social media to support their Social Customer Relationship Management activity. We investigate the dynamics and evolution of conversation network between bank customer using Social Network Analysis methodology. Measurement is conducted by calculating its network properties to see the characteristic and how active the network is. Customers talking about banks’ services can also express their opinion on social media. Therefore we perform sentiment analysis to classify customer’s opinion into positive, negative and neutral class. This research was performed on Twitter’s conversation about Bank Mandiri, Bank Central Asia (BCA) and Bank Negara Indonesia (BNI). The result of this research is beneficial for business intelligence purpose to support decision making.
  • 83. MEASURING MARKETING COMMUNICATIONS MIX EFFORT USING MAGNITUDE OF INFLUENCE AND INFLUENCE RANK METRIC Abstract: In the context of modern marke:ng, Twi>er is considered as a communica:on pla@orm to spread informa:on. Many companies create and acquire several Twi>er accounts to support and perform varie:es of marke:ng mix ac:vi:es. Ini:ally, each accounts used to capture specific market profile. Together, the accounts create network of informa:on that provide consumer to the informa:on they need depends on their contextual u:lisa:on. From many accounts available, we have the fundamental ques:on on how to measure influence of each account in the market based not only their rela:ons, but also the effects of their pos:ngs. Magnitude of Influence (MOI) metric is adapted together with Influence Rank (IR) measurement of accounts in their social network neighbourhood. We use social network analysis approach to analyse 65 accounts in the social network of an Indonesian mobile phone network operator, Telkomsel which involved in marke:ng communica:ons mix ac:vi:es through series of related tweets. Using social network provide the idea of the ac:vity in building and maintaining rela:onships with the target audience. This paper shows the results of the most poten:al accounts based on the network structure and engagement. Based on this research, the more number of followers one account has, the more responsibility it has to generate the interac:on from their followers in order to achieve the expected effec:veness. The focus of this paper is to determine the most poten:al accounts in the applica:on of marke:ng communica:ons mix in Twi>er. ratio of affection magnitude of influence LCRT function influence rank (based on pagerank)
  • 84. MAPPING ONLINE TRANSPORTATION SERVICE QUALITY AND MULTI-CLASS CLASSIFICATION PROBLEM SOLVING PRIORITIES CASE STUDY : GOJEK AND GRAB Abstract. Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2% positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end users.
  • 85. HYBRID SENTIMENT AND NETWORK ANALYSIS OF SOCIAL OPINION POLARIZATION Abstract: The rapid growth of social media and user generated contents (UGC) has provided a rich source of poten:ally relevant data. The problems arise on how to summarise those data to understand and transforming it into informa:on. Twi>er as one of the most popular social networking and micro- blogging service can be analysed in terms of content produced with sen:ment analysis. On the other hand, some types of networks can also be constructed to analyse the social network structure and network proper:es. This research intended to combine those content and structural approaches into hybrid approach for iden:fies social opinion polarisa:on, this is in the form of conversa:on network. Sen:ment analysis used to determine public sen:ment, and social network analysis used to analyse the structure of the network, detec:ng communi:es and influen:al actors in the network. Using this hybrid approach, we have comprehensive understanding about social opinion polarisa:on. As case study, we present real social opinion polarisa:on about reclama:on issue in Indonesia.
  • 86. DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELLING Case Study: Uber Digital flows now exert a larger impact, the world is now more connected than ever, the amount of cross-border bandwidth that used has grown 45 times larger since 2005. With the massive amount of data spreading in the net, including social media, speed is one most essential factor in business. companies can take advantage of social media as a source to analyse and extract the customer’s opinion, and therefore the company can have quick response towards the condition. The main purpose of this research is content analysis, to obtain the goal, we need to extract the information as well as summarise the topic inside it. However, in order to analyse the content quickly, there are varies choice of tools with its specific output that creates challenges in the process. We use Naïve Bayes Sentiment Analysis based on time-series, specifically on daily basis and topic modeling based on Latent Dirichlet Allocation (LDA) to evaluate the sentiment of the topic as well as the model of the topics discussed. The purpose of this research is to help both companies and individuals to map the public opinion towards certain topic by analyzing the sentiment of the text and create a topic model. Therefore, a real-time information for determining the consumer opinion become a crucial part. Twitter can serve the purpose as one source of real-time information from user-generated content. We pick Uber as the case study, viewed as one of the most favored transportation methods in most part of the world. Data collection period is from 10th February 2017 until 28th February 2017 with 1.048.576 tweets collected.
  • 87. ANALYSING EMPLOYEE VOICE USING REAL-TIME FEEDBACK Abstract People nowadays tend to use social media as a platform to share their reviews, emotions, and opinions, including about their jobs. Thus, a lot of data is available on the web. Therefore, a rapid response is needed to analyse and interpret the data. Unfortunately, many organisations still use annual surveys to assess satisfaction, engagement, and culture in the workplace. Compared to other conventional datasets such as company survey and questionnaire, decision-makers could make decision effectively and efficiently by using the interpreted data. This may be done with the help of sentiment analysis method. In this research, we classify the feedback based on its category and sentiment. Several classification algorithms are used in opinion mining, two of them are Naive Bayes Classifier (NBC) and Support Vector Machine (SVM). This paper aims to classify feedback based on sentiments using NBC and SVM. *ICST, 2018
  • 88. MONTE CARLO SIMULATION AND CLUSTERING FOR CUSTOMER SEGMENTATION IN BUSINESS ORGANISATION Abstract: U:lising data for segmenta:on analysis can bring a streamlined way to get poten:al insight as of decision making support in a business organisa:on. Using appropriate data analy:cal technique help the organisa:ons in profiling their customer segments accurately. The result brings an effec:ve marke:ng strategy. However, there are :mes in doing data analy:c, the organisa:on needs another variable of data where the value is unavailable, for example: customer’s income data which mostly hard to collect. By using Monte Carlo simula:on, the value of customer’s income can be generated and then compared with customer spending to construct customer segmenta:on model. An unsupervised learning for customer segmenta:on model using K-Means clustering enables us to see the grouping pa>erns of customer’s income towards their spending. Clusters of the dataset might be interpreted as a group of customers that having a similar character. This paper shows us how to generate customer’s income data and create data cluster to op:mising customer poten:al by u:lising data. Furthermore, the result brings us insight into which group off the customer might unserved properly considering their average income with their spending behaviour.
  • 89. MAPPING ORGANISATION KNOWLEDGE NETWORK AND SOCIAL MEDIA BASED REPUTATION MANAGEMENT Abstract—Knowledge management and reputation are important aspects in an organization, especially in ICT industry. Controlling knowledge management and modeling personal reputation through social media is essentials for the organization because we can see how employee build their relationship around their peer networks or clients virtually and how knowledge network can support organization performance. The purpose of this research is to map knowledge network and reputation formulation in order to fully understand how knowledge flow in an organization and whether employee reputation have higher degree of influence in organization knowledge network. We particularly develop formulas to measure knowledge network and personal reputation based on their social media activities. As case study, we pick an Indonesian ICT company which actively build their business around their employee peer knowledge outside the company. For knowledge network, we perform data collection by conducting interviews. For reputation management, we crawl data from several popular social media. We base our work on Social Network Analysis methodology. The result shows that employees knowledge is directly proportional with their reputation, but there are different reputations level on different social media observed in this research. reputation formula for twitter, instagram and linkedin
  • 90. PREDICTION MODELS BASED ON FLIGHT TICKETS AND HOTEL ROOMS DATA SALES FOR RECOMMENDATION SYSTEM IN ONLINE TRAVELAGENT BUSINESS Abstract - Indonesia as one of the favorite vacation destinations of domestic and foreign travelers made the value of investment in the tourism industry continued to grow significantly. This was created more Online Travel Agent business in recent years. However, it made a lot of business travel and Umrah travel in Indonesia is threatened with bankruptcy, after the online travel business activity is rampant in conventional business market ticket sales and travel tours. The research case study is different from the Online Travel Agent business in general, because it worked in real-time analytic using flight tickets and hotel rooms sales data to create prediction or recommendation model. Data mining, extraction of hidden predictive information from large databases, was a powerful technique with great potential to help companies focus on the most important information in their data warehouse. By using classification method in data mining, the objectives of this paper is to create predictive models from flight tickets and hotel rooms sales data using the decision tree classification approach. The result of this paper is beneficial for business that can be used as basic algorithm for programming in Online Travel Agent recommendation feature.
  • 91. EFFECTIVE KNOWLEDGE MANAGEMENT USING BIG DATAAND SOCIAL NETWORK ANALYSIS Vizualisa+on of hierarchical structure organiza+on and knowledge flow of informal organiza+on Abstract: Knowledge management consists of iden+fying, crea+ng, represen+ng, distribu+ng, and enabling adop+on of insights and experiences in an organiza+on. One approach of modeling knowledge management is using network model. Big Data is one of important ICT technological roadmap, which main func+on is modelling behaviour and helping organiza+on decision support. Social Network Analysis is a micro version of Big Data where we can model and establish social network quan+fica+on. In this paper we will show how Social Network Analysis can help organiza+on applying Knowledge Management strategies and prac+ces by experiment using real-world large dataset contains 360000+ email exchanges between 36000+ employees inside in an organiza+on business case resolved using SNA methodology map of full network emaile xchange between employes in Enron
  • 92. INDONESIA INFRASTRUCTURE AND CONSUMER STOCK PORTFOLIO PREDICTION USING ARTIFICIAL NEURAL NETWORK BACKPROPAGATION *ICOICT, 2017 Abstract: Ar:ficial Neural Network (ANN) method is increasingly popular to build predic:ve model that generated small error predic:on. To have a good model, ANN needs large dataset as an input. ANN backpropaga:on is a gradient decrease method to minimize the output error squared. Stock price movements are suitable with ANN requirement : it is a large data set because stock price is recorded up to every seconds, usually called high frequency data. The implementa:onof stock price predic:on using ANN approach is quite new. The predic:ve model help investor in building stock por@olio and their decision making process. Buying some stocks in por@olio decrease diversified risk and increases the chance of higherreturn.In this paper, we show how to generate predic:on model using ar:ficial neural network backpropaga:on of stock price and forming por@olio with predicted price that bring predic:on of the por@olio with the smallest error. The data set we use is historical stock price data from ten different company stocks of infrastructure and consumer sector Indonesia Stock Exchage. The results is for lower risk condi:on, ANN predic:ve model gives higher expected return than the return from real condi:on, while for higher risk, the return from the real condi:on is higherthan the ANN predic:ve model.
  • 93. THE DYNAMIC OF BANKING NETWORK TOPOLOGY Case Study: Indonesian Presidential Election Event ABSTRACT - Information and communication technologies have brought major changes in data storage and processing. Various types and high volume of data has been digitalised and support mining-based data processing to provide knowledge in a modern and efficient way. Banking transaction data has been stored digitally and suitable for the mining process especially in network science model.Understanding transaction system risk requires fundamental study on payments flow and bank behaviour in various situations. Lehman Brother’s failure spread contagion impact in a short time indicates that financial markets have interdependent properties and connected to each other in a large network. Thus, overall system network approach becomes more important than a single bank. Political conditions greatly affect economic stability including the banking and financial sectors. Presidential election is a major political event for a nation. This affected on community sentiment and financial market. However, the linkage between political events and topological changes is poorly understood.This research presents an insight of the event driven dynamic network topology with banking transaction as a case study. We search for the banking transaction network topology dynamic driven by 2014 Indonesian presidential election event. We discover that banks are more engaged to others in larger value 3 days before the end of campaign period and less engaged to others in smaller value in the end of campaign period. Unique transaction activity between banks remain stable with low declination in the end of campaign period. This scenario provides the possibility to learn the banking transaction pattern and support the financial system stability supervision.
  • 94. A COMPARATIVE STUDY OF EMPLOYEE CHURN PREDICTION MODEL Abstract - Churn phenomenon commonly occurs in customer loyalty towards brand product or services. They becomes critical issue that any industry would make best effort to avoid. Churn problem may arise within the organisation, called employee churn. Employee churn creates myriad and adverse effects to the organisation as it correlates with unfairly workload distribution, great deal of money lost and also extra time needed to find a replace, which may result in the rise of customer dissatisfaction rate. The purpose of this study is to find the best model to predict employee churn. A successful prediction model for employee churn is significantly needed in order to avert various negative impacts for the organisation. There are three popular classification models for prediction, namely naïve bayes, decision tree, and random forest. This study compares performance of the aforementioned models by using Human Resource Information System (HRIS) from one of Indonesia’s renowned telecommunication company. The data collected for the study spans for 2 years period, started from 2015 until 2017. The findings from the study suggest that the best classification model is random forest due to its immense accuracy of 97.5%. The second-best method is naïve bayes with 96.6%, and the lowest accuracy of classification model is decision tree with 88.7%. The study concludes that the most reliable and accurate classification model to predict employee churn is random forest
  • 96. STATISTICS DATA ANALYTICS Confirmative Explorative Small Data Set Larga Data Set Small Number of Variable Large Number of Variable Deductive (no predictions) Inductive Numeric Data Numeric and Non-Numeric Data Clean Data Data Cleaning Complimentary Methods
  • 97. • big data provide granular, micro data • big data provide relatively fast and cheap process • research opportunity on data science methods, implementation and evaluation maturity • data scientist helps big data initiatives towards future and sustainable economic activities • uncovering hidden truths, democratisation by data, are primary objective of data scientist • hard to Nind data scientist talent • high cost to maintain data scientist talent .. • big data often populations study, so no sampling error => methods familiarity • beneNit > data quality + costs + security • ML result credibility (different algorithm, different conclusion) CHALLENGES Opportunities Challenges
  • 100.
  • 101. without Big Data, you are blind and deaf in the middle of a freeway - Geoffrey Moore -