Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BIG	DATA	ANALYTICS
UNDERSTANDING	FOR	RESEARCH	ACTIVITY
Dr. Andry Alamsyah
Asosiasi Ilmuwan Data Indonesia
School of Econom...
Research Field :
Social	Computing,	Social	Network,	Complex	Network	/	Network	Science,	Computational	Social	
Science,	Data	...
• Background	and	Motivation		
• Big	Data	DeNinition	and	Related	Field	
• Understanding	Pattern	
• Data	Analytics	/	Machine...
Background	&	Motivation
Remember	This	?
>	information	overload,			
>	technological	based	society	
>	acquire	new	value	=>	new	culture
>	empowered	individuals	
>	mo...
Storytelling
Contextual	Story
• Industry	4.0	->	cyber	physical	system	->	enabling	human	to	produce	large-
scale	data	->	human	behaviour	quantiNication	
...
Competing	Ecosystem	&	Data
Cheap	Change	Everything
efficient economy
new value proposition
• cutting through the BIG DATA hype
• cheap means everywher...
Big	Data	DeNinition	and	
Related	Field
Big	Data	DeNinition
•a term => describe extremely large amounts of structured and
unstructured data

•the activity => capt...
Volume,	Variety,	and	Velocity	are	the	"essen+al"	characteris+cs	of	Big	Data
Veracity, and Value are the "quality" of Big D...
DATA ANALYTICS
-the discovery, interpretation, and communication of meaningful patterns in data (wikipedia)
-the process t...
Data	Analytics
• The discovery, interpretation, and communication of meaningful patterns in data (wikipedia)

• The proces...
Predictive	Analytics
• study the past if you want to study the future (confucius)

• Predictive Analytics is the art of bu...
Data	Science
Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to
e...
Data	Science
Body	of	Knowledge
CRISP-DM
CRISP-DM -> Cross -Industry Standard Process for Data Mining is an open standard process model that
describes com...
Structure	Data	Type
Column Value
Pa+ent Andry	Alamsyah
Date	of	Birth 12/07/1995
Date	Admi?ed 02/03/2019
“The patient came ...
Working	with	Structured	Data
Working	with	Unstructured	Data
brand A brand B
(Big)	Data	Opportunity
Understanding	Pattern
Understanding	Pattern
Structured Data
Mapping Position
Understanding	Pattern
Unstructured Data
Friendship Network
Understanding	Pattern
Unstructured Data
Growth Friendship Network
Understanding	Pattern
Unstructured Data
Conversational Network
Understanding	Pattern
Structured Data
Regional Economic Value
based on Checkin
Mechanism
How	Can	(Big)	Data	Analytics	Helps?
by describing the phenomenon,
by predicting the value,
by estimating the future outcom...
Data	Analytics	/	Machine	Learning	
Fundamentals	(Prediction	and	
Recommendation)
• Machine learning is defined as an automated process that extracts
patterns from data to build the models used in predicti...
Machine	Learning
Machine Learning is an idea to learn from
examples and experience, without being
explicitly programmed.
I...
Machine	Learning
Machine	 learning	 (ML)	 is	 the	 science	 of	
getting	 computers	 to	 act	 without	 being	
explicitly	pr...
Learning	Methodology
Machine	Learning	in	Business
Finance and Banking
• Credit scoring
• Fraud detection
• Risk Analysis
• Portfolio Optimizati...
1.Formula	/	Function	
• T	=	0.48O	+	0.23TL	+	0.5D	
2.Decision	Tree	
3.	Correlation	or	Association	
4.Rule		
• IF	IPS3=2.8	...
Learning	Illustration
A
BA
B A
B
A
B A
B
A
B
A
B
A
B
A
B
Data ->
Two Possible Solutions
1 2
•It is based on a labeled training set.
•The class of each piece of data in
training set is known.
•Class labels are pre-d...
Supervised	Learning
•Prediction	methods	are	commonly	referred	to	as	
supervised	learning.	Supervised	methods	are	
thought	...
Supervised	Learning
Problems	:	
• ClassiNication	
• The	domain	of	the	target	attribute	is	Ninite	and	categorical.	
• A	cla...
Supervised	Learning
Supervised	Learning
Class/Label/TargetAttribute/Feature
Nominal
Numerik
Unsupervised	Learning
•Input	:	set	of	patterns	P,	from	n-dimensional	space	S,	but	little	or	no	
information	about	their	cl...
Unsupervised	Learning
Problems	:	
• Clustering	
• Association	Rules	
• Pattern	Mining	
• It	is	adopted	as	a	more	general	t...
Unsupervised	Learning
Unsupervised	Learning
Attribute/Feature
Background	:	
• How	to	learn	a	new	skill	
• Learning	and	intelligence	
• Interaction	with	environment	
• Goal-oriented	lea...
The	Analogy	
• A	child	learns	to	walk	
• The	child	is	an	agent	trying	to	manipulate	the	
environment	
• The	child	is	takin...
Reinforcement	Learning
Various	Practical	applications	of	Reinforcement	Learning		
• RL	can	be	used	in	robotics	for	industr...
Data	Preparation	(CRISP-DM)
Data Preprocessing
• Measures for data quality: A multidimensional view
• Accuracy: correct or...
1.Data	Cleaning
a. Fill	in	missing	values
b. Smooth	noisy	data
c. Iden+fy	or	remove	outliers
d. Resolve	inconsistencies
2....
Common	Data	
Analytics	Rules
Tasks Descrip6on Algorithms Examples
Classification Predict if data points belongs to one
of t...
Estimation
Customer Order Number	of	Traffic	Light Distance Travel	Time
1 3 3 3 16
2 1 7 4 20
3 2 4 6 18
4 4 6 8 36
...
1000 ...
Predictions
stock price dataset in
time series format
label
prediction using
Neural Network
Learning
prediction plot
ClassiNication
NIM Gender Nilai	UN Asal	Sekolah IPS1 IPS2 IPS3 IPS	4 ... Lulus	Tepat	
10001 L 28 SMAN	2 3.3 3.6 2.89 2.9 Y...
input : golf playing recommendation
output (rules) :
If	outlook	=	sunny	and	humidity	=	high	then	play	=	no

If	outlook	=	r...
Clustering
dataset without label
learning using K-means
clustering methods
Association
learning using FP-Growth
association methods
1.Es+ma+on:
- Linear	Regression,	Neural	Network,	Support	Vector	Machine,	etc
2.Predic+on/Forecas+ng:
- Linear	Regression,	...
Based	on	Information	Theory,	for	example	in	Decision	Tree	model	
induced	by	the	concept	of	entropy	and	information	gain.
I...
Similarity-Based	Learning
Training
Records
Test Record
Compute
Distance
Basic	Idea	=>	If	it	walks	like	a	duck,	quack	like	...
Probability-based	prediction	approaches	are	heavily	based	on	Bayes’	Theorem
Probability-Based	Learning
•	A	probabilistic	f...
perform	a	search	for	a	set	of	parameters	for	a	parameterised	model	that	minimises	the	
total	error	across	the	predictions	...
Model	Evaluation
1.Estimation:	
- Error:	Root	Mean	Square	Error	(RMSE),	MSE,	MAPE,	etc	
2.Prediction/Forecasting	
- Error:...
learning and evaluation process confusion matrix
PREDICTED CLASS
ACTUAL

CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c...
Model	Evaluation
• Learning	curve	shows	how	
accuracy	changes	with	varying	
sample	size	
• Requires	a	sampling	schedule	fo...
Increase	Coverage
Experiment Dataset Accuracy
1 93%
2 91%
3 90%
4 93%
5 93%
6 91%
7 94%
8 93%
9 91%
10 90%
Average	Accurac...
The	Future	ML	Trends
artificial neural network
convolutional neural network
deep learning
Social	Media	Analytics
WorkNlow
Application
Programming
Interface
(API)
Crawling
Process
> Network Structure
(Social Network Analysis)
> Content ...
Social	Network	Analysis
Content	Analysis
Text	Network
First Topic
Identified
Topic	Modelling
•Topic modelling is a type of statistical modelling for discovering the abstract
“t...
TOP BRAND ALTERNATIVE MEASUREMENT BASED ON
CONSUMER NETWORK ACTIVITY
Abstract:
In Business Intelligence effort, the legacy...
A COMPARISON OF INDONESIA E-COMMERCE SENTIMENT ANALYSIS FOR
MARKETING INTELLIGENCE EFFORT
CASE STUDY : BUKALAPAK, TOKOPEDI...
NETWORK TEXT ANALYSIS TO SUMMARISE ONLINE CONVERSATIONS FOR
MARKETING INTELLIGENCE EFFORTS IN TELECOMMUNICATION
INDUSTRY
A...
NETWORK MARKET ANALYSIS USING LARGE SCALE SOCIAL
NETWORK CONVERSATION OF INDONESIA FAST FOOD INDUSTRY
Abstract - The high ...
SOCIAL NETWORK AND SENTIMENT ANALYSIS FOR SOCIAL CUSTOMER
RELATIONSHIP MANAGEMENT IN INDONESIA BANKING SECTOR
	 	 	
SCRM N...
MEASURING MARKETING COMMUNICATIONS MIX EFFORT USING
MAGNITUDE OF INFLUENCE AND INFLUENCE RANK METRIC
Abstract:	In	the	cont...
MAPPING ONLINE TRANSPORTATION SERVICE QUALITY AND MULTI-CLASS
CLASSIFICATION PROBLEM SOLVING PRIORITIES
CASE STUDY : GOJEK...
HYBRID SENTIMENT AND NETWORK ANALYSIS OF SOCIAL
OPINION POLARIZATION
Abstract:	The	rapid	growth	of	social	media	and	user	g...
DYNAMIC LARGE SCALE DATA ON TWITTER USING
SENTIMENT ANALYSIS AND TOPIC MODELLING
Case Study: Uber
Digital flows now exert a...
ANALYSING EMPLOYEE VOICE USING REAL-TIME FEEDBACK
Abstract People nowadays tend to use social media as a platform to share...
MONTE CARLO SIMULATION AND CLUSTERING FOR
CUSTOMER SEGMENTATION IN BUSINESS ORGANISATION
Abstract:	U:lising	data	for	segme...
MAPPING ORGANISATION KNOWLEDGE NETWORK AND
SOCIAL MEDIA BASED REPUTATION MANAGEMENT
Abstract—Knowledge management and repu...
PREDICTION MODELS BASED ON FLIGHT TICKETS AND HOTEL ROOMS DATA
SALES FOR RECOMMENDATION SYSTEM IN ONLINE TRAVELAGENT BUSIN...
EFFECTIVE KNOWLEDGE MANAGEMENT USING
BIG DATAAND SOCIAL NETWORK ANALYSIS
Vizualisa+on	of	hierarchical	structure	organiza+o...
INDONESIA INFRASTRUCTURE AND CONSUMER STOCK PORTFOLIO
PREDICTION USING ARTIFICIAL NEURAL NETWORK BACKPROPAGATION
*ICOICT, ...
THE DYNAMIC OF BANKING NETWORK TOPOLOGY
Case Study: Indonesian Presidential Election Event
ABSTRACT - Information and comm...
A COMPARATIVE STUDY OF EMPLOYEE CHURN PREDICTION
MODEL
Abstract - Churn phenomenon commonly occurs in customer loyalty tow...
Conclusion
STATISTICS DATA ANALYTICS
Confirmative Explorative
Small Data Set Larga Data Set
Small Number of Variable Large Number of V...
• big	data	provide	granular,	micro	data	
• big	data	provide	relatively	fast	and	cheap	process	
• research	opportunity	on	d...
Data	Visualisation
The	Power	of	Data	is	…
every	breath	you	take	
every	move	you	make	
every	bond	you	break	
every	step	you	take	
l’ll	be	watc...
without Big Data, you are blind
and deaf in the middle of a freeway
- Geoffrey Moore -
Big Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research Activity
Upcoming SlideShare
Loading in …5
×

of

Big Data Analytics : Understanding for Research Activity Slide 1 Big Data Analytics : Understanding for Research Activity Slide 2 Big Data Analytics : Understanding for Research Activity Slide 3 Big Data Analytics : Understanding for Research Activity Slide 4 Big Data Analytics : Understanding for Research Activity Slide 5 Big Data Analytics : Understanding for Research Activity Slide 6 Big Data Analytics : Understanding for Research Activity Slide 7 Big Data Analytics : Understanding for Research Activity Slide 8 Big Data Analytics : Understanding for Research Activity Slide 9 Big Data Analytics : Understanding for Research Activity Slide 10 Big Data Analytics : Understanding for Research Activity Slide 11 Big Data Analytics : Understanding for Research Activity Slide 12 Big Data Analytics : Understanding for Research Activity Slide 13 Big Data Analytics : Understanding for Research Activity Slide 14 Big Data Analytics : Understanding for Research Activity Slide 15 Big Data Analytics : Understanding for Research Activity Slide 16 Big Data Analytics : Understanding for Research Activity Slide 17 Big Data Analytics : Understanding for Research Activity Slide 18 Big Data Analytics : Understanding for Research Activity Slide 19 Big Data Analytics : Understanding for Research Activity Slide 20 Big Data Analytics : Understanding for Research Activity Slide 21 Big Data Analytics : Understanding for Research Activity Slide 22 Big Data Analytics : Understanding for Research Activity Slide 23 Big Data Analytics : Understanding for Research Activity Slide 24 Big Data Analytics : Understanding for Research Activity Slide 25 Big Data Analytics : Understanding for Research Activity Slide 26 Big Data Analytics : Understanding for Research Activity Slide 27 Big Data Analytics : Understanding for Research Activity Slide 28 Big Data Analytics : Understanding for Research Activity Slide 29 Big Data Analytics : Understanding for Research Activity Slide 30 Big Data Analytics : Understanding for Research Activity Slide 31 Big Data Analytics : Understanding for Research Activity Slide 32 Big Data Analytics : Understanding for Research Activity Slide 33 Big Data Analytics : Understanding for Research Activity Slide 34 Big Data Analytics : Understanding for Research Activity Slide 35 Big Data Analytics : Understanding for Research Activity Slide 36 Big Data Analytics : Understanding for Research Activity Slide 37 Big Data Analytics : Understanding for Research Activity Slide 38 Big Data Analytics : Understanding for Research Activity Slide 39 Big Data Analytics : Understanding for Research Activity Slide 40 Big Data Analytics : Understanding for Research Activity Slide 41 Big Data Analytics : Understanding for Research Activity Slide 42 Big Data Analytics : Understanding for Research Activity Slide 43 Big Data Analytics : Understanding for Research Activity Slide 44 Big Data Analytics : Understanding for Research Activity Slide 45 Big Data Analytics : Understanding for Research Activity Slide 46 Big Data Analytics : Understanding for Research Activity Slide 47 Big Data Analytics : Understanding for Research Activity Slide 48 Big Data Analytics : Understanding for Research Activity Slide 49 Big Data Analytics : Understanding for Research Activity Slide 50 Big Data Analytics : Understanding for Research Activity Slide 51 Big Data Analytics : Understanding for Research Activity Slide 52 Big Data Analytics : Understanding for Research Activity Slide 53 Big Data Analytics : Understanding for Research Activity Slide 54 Big Data Analytics : Understanding for Research Activity Slide 55 Big Data Analytics : Understanding for Research Activity Slide 56 Big Data Analytics : Understanding for Research Activity Slide 57 Big Data Analytics : Understanding for Research Activity Slide 58 Big Data Analytics : Understanding for Research Activity Slide 59 Big Data Analytics : Understanding for Research Activity Slide 60 Big Data Analytics : Understanding for Research Activity Slide 61 Big Data Analytics : Understanding for Research Activity Slide 62 Big Data Analytics : Understanding for Research Activity Slide 63 Big Data Analytics : Understanding for Research Activity Slide 64 Big Data Analytics : Understanding for Research Activity Slide 65 Big Data Analytics : Understanding for Research Activity Slide 66 Big Data Analytics : Understanding for Research Activity Slide 67 Big Data Analytics : Understanding for Research Activity Slide 68 Big Data Analytics : Understanding for Research Activity Slide 69 Big Data Analytics : Understanding for Research Activity Slide 70 Big Data Analytics : Understanding for Research Activity Slide 71 Big Data Analytics : Understanding for Research Activity Slide 72 Big Data Analytics : Understanding for Research Activity Slide 73 Big Data Analytics : Understanding for Research Activity Slide 74 Big Data Analytics : Understanding for Research Activity Slide 75 Big Data Analytics : Understanding for Research Activity Slide 76 Big Data Analytics : Understanding for Research Activity Slide 77 Big Data Analytics : Understanding for Research Activity Slide 78 Big Data Analytics : Understanding for Research Activity Slide 79 Big Data Analytics : Understanding for Research Activity Slide 80 Big Data Analytics : Understanding for Research Activity Slide 81 Big Data Analytics : Understanding for Research Activity Slide 82 Big Data Analytics : Understanding for Research Activity Slide 83 Big Data Analytics : Understanding for Research Activity Slide 84 Big Data Analytics : Understanding for Research Activity Slide 85 Big Data Analytics : Understanding for Research Activity Slide 86 Big Data Analytics : Understanding for Research Activity Slide 87 Big Data Analytics : Understanding for Research Activity Slide 88 Big Data Analytics : Understanding for Research Activity Slide 89 Big Data Analytics : Understanding for Research Activity Slide 90 Big Data Analytics : Understanding for Research Activity Slide 91 Big Data Analytics : Understanding for Research Activity Slide 92 Big Data Analytics : Understanding for Research Activity Slide 93 Big Data Analytics : Understanding for Research Activity Slide 94 Big Data Analytics : Understanding for Research Activity Slide 95 Big Data Analytics : Understanding for Research Activity Slide 96 Big Data Analytics : Understanding for Research Activity Slide 97 Big Data Analytics : Understanding for Research Activity Slide 98 Big Data Analytics : Understanding for Research Activity Slide 99 Big Data Analytics : Understanding for Research Activity Slide 100 Big Data Analytics : Understanding for Research Activity Slide 101 Big Data Analytics : Understanding for Research Activity Slide 102
Upcoming SlideShare
What to Upload to SlideShare
Next

24 Likes

Share

Big Data Analytics : Understanding for Research Activity

Big Data Analytics Presentation at International Workshop Colloquium Exploring Research Opportunity. School of Business and Management (SBM) - ITB. Bandung, 8 August 2019.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Big Data Analytics : Understanding for Research Activity

  1. 1. BIG DATA ANALYTICS UNDERSTANDING FOR RESEARCH ACTIVITY Dr. Andry Alamsyah Asosiasi Ilmuwan Data Indonesia School of Economics and Business,Telkom University
  2. 2. Research Field : Social Computing, Social Network, Complex Network / Network Science, Computational Social Science, Data Analytics, Big Data, Data Mining, Graph Theory, Disruptive Innovation / Disruptive Economy, ICT Entrepreneurial Business, Data / Information Business Andry Alamsyah • Researcher / Data Scientist • Director of Digital Business Ecosystem Research Centre • Chief and Founder of Lab. Social Computing & Big Data • Chairman & Founder Indonesian Data Scientist Society (AIDI) email andry.alamsyah@gmail.com blog andrya.staff.telkomuniversity.ac.id repository telkomuniversity.academia.edu/andryalamsyah repository researchgate.net/profile/Andry_Alamsyah linkedin linkedin.com/andry.alamsyah twitter twitter.com/andrybrew Education : S1 : Mathematics - ITB, Topic: Statistics S2 : Informatics - UPJV, France, Topic: Information System, and Multimedia S3 : Electro and Informatics - ITB, Topic: Social Network, and Big Data Links : Introduction
  3. 3. • Background and Motivation • Big Data DeNinition and Related Field • Understanding Pattern • Data Analytics / Machine Learning Fundamental (Prediction and Recommendation) • Social Media Analytics (by Case Study) • Conclusion • Working on Your Computer (Machine Learning Practice) Agenda
  4. 4. Background & Motivation
  5. 5. Remember This ?
  6. 6. > information overload, > technological based society > acquire new value => new culture > empowered individuals > more data available > building contextual story / search Digital Ocean
  7. 7. Storytelling
  8. 8. Contextual Story
  9. 9. • Industry 4.0 -> cyber physical system -> enabling human to produce large- scale data -> human behaviour quantiNication • Key Technologies : data, computational power and connectivity; analytics and intelligence; human machine interaction; advanced production methods the environment Deloitte, Industry 4.0 Industry 4.0
  10. 10. Competing Ecosystem & Data
  11. 11. Cheap Change Everything efficient economy new value proposition • cutting through the BIG DATA hype • cheap means everywhere • cheap creates value • from cheap to strategy complex human behaviour market uncertainty business sustainability disruptive economic coopetitive, cooperative, competitive business ecosystem / platform programmable economy event driven API economy toward large-scale and massive socio-economic impact Industry 4.0
  12. 12. Big Data DeNinition and Related Field
  13. 13. Big Data DeNinition •a term => describe extremely large amounts of structured and unstructured data •the activity => capture / storage / processing / sharing / reporting of data => beyond the ability of legacy software tools and hardware infrastructure •related to many “science” branch => data analytics, data science, machine learning,  artificial intelligence, IoT, and many more •the application => on many field => efficient, cost-effective, faster & accurate decision making Gigabyte 109 = 1.000.000.000 Terabyte 1012 = 1.000.000.000.000 Petabyte Exabyte 1015 = 1.000.000.000.000.000 Exabyte 1018 = 1.000.000.000.000.000.000 Zetabyte 1021 = 1.000.000.000.000.000.000.000 1990 2010 Hadoop store 1400 MB store 1TB 100 drives working at the same time can read 1TB data in 2 minutes transfer speed 4.5 MB/s transfer speed 100 MB/s read drive ~ 5 minutes read drive ~ 3 hours
  14. 14. Volume, Variety, and Velocity are the "essen+al" characteris+cs of Big Data Veracity, and Value are the "quality" of Big Data The 5'Vs
  15. 15. DATA ANALYTICS -the discovery, interpretation, and communication of meaningful patterns in data (wikipedia) -the process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make more informed business decision SOURCE review, opinion, historical data, conversation, network friendship, CCTV, Vlog, location tagging, etc BIG DATA large, fast, complex the 5V’s data DATA SCIENCE the science to extract knowledge / pattern from data SOCIAL COMPUTING quantification of human / social behaviour INSIGHT market segmentation, risk analytics information dissemination, recommended investment, fraud detection, personalised adv, customer acquisition and retention, purchase behaviour, early detection event, brand awareness, etc opportunity activity methodology benefit application Big Data Related Terms (Use Case)
  16. 16. Data Analytics • The discovery, interpretation, and communication of meaningful patterns in data (wikipedia) • The process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make more informed business decision predictive, descriptive, diagnostic, prescriptive.
  17. 17. Predictive Analytics • study the past if you want to study the future (confucius) • Predictive Analytics is the art of building and using models that make predictions based on patterns extracted from historical data. Predictive analytics applications include: price predictions, dosage predictions, risk assesment, propensity/likelihood modelling, diagnosis, document classifications • Predictive is the assignment of a value to any unknown variable. • A model is trained to make predictions based on a set of historical examples. (we use Machine Learning)
  18. 18. Data Science Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.
  19. 19. Data Science
  20. 20. Body of Knowledge
  21. 21. CRISP-DM CRISP-DM -> Cross -Industry Standard Process for Data Mining is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model.[2]
  22. 22. Structure Data Type Column Value Pa+ent Andry Alamsyah Date of Birth 12/07/1995 Date Admi?ed 02/03/2019 “The patient came in complaining of chest pain, shortness of breath, and lingering headaches.. Smokes 2 packs a day.. Family history of heart disease.. Has been experiencing similar symptoms for the past 12 hours…” High Degree of Organiza+on, such as a rela+onal database Informa+on that is difficult to organise using tradi+onal mechanisms VS Structured Unstructured
  23. 23. Working with Structured Data
  24. 24. Working with Unstructured Data brand A brand B
  25. 25. (Big) Data Opportunity
  26. 26. Understanding Pattern
  27. 27. Understanding Pattern Structured Data Mapping Position
  28. 28. Understanding Pattern Unstructured Data Friendship Network
  29. 29. Understanding Pattern Unstructured Data Growth Friendship Network
  30. 30. Understanding Pattern Unstructured Data Conversational Network
  31. 31. Understanding Pattern Structured Data Regional Economic Value based on Checkin Mechanism
  32. 32. How Can (Big) Data Analytics Helps? by describing the phenomenon, by predicting the value, by estimating the future outcome, by optimising the resources and the decision, by simulating all the possible scenarios ..
  33. 33. Data Analytics / Machine Learning Fundamentals (Prediction and Recommendation)
  34. 34. • Machine learning is defined as an automated process that extracts patterns from data to build the models used in predictive analytics applications. • A branch of artificial intelligence, concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data. Machine Learning
  35. 35. Machine Learning Machine Learning is an idea to learn from examples and experience, without being explicitly programmed. Instead of writing code, we feed data to the generic algorithm, and it builds logic based on the data given. Computer Output Program Data • Traditional Programming Computer Program Output Data • Machine Learning
  36. 36. Machine Learning Machine learning (ML) is the science of getting computers to act without being explicitly programmed. ML has given us self- driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. ML is pervasive today, we probably use it dozens of times a day without knowing it. It is the best way to make progress towards human-level AI. (standford/coursera) ML is a type of artiNicial intelligence (AI) that provides computers with the ability to l e a r n w i t h o u t b e i n g e x p l i c i t l y p r o g r a m m e d . M L f o c u s e s o n t h e development of computer programs that can teach themselves to grow and change when exposed to new data. (whatis.com)
  37. 37. Learning Methodology
  38. 38. Machine Learning in Business Finance and Banking • Credit scoring • Fraud detection • Risk Analysis • Portfolio Optimization • Client Analysis • Trading Exchange Forecasting Retail and E-Commerce • Price Optimization • Recommendation • Predictive Inventory Planning • Fraud Detection • Customer Segmentation Manufacturing • Predictive Maintenance or Condition Monitoring. • Warranty reserve estimation • Demand forecasting • Process Optimization Marketing and Sales • Market and Customer Segmentation • Price Optimization • Customer Churn Analysis • Customer lifetime value prediction • Sentiment Analysis in Social Networks
  39. 39. 1.Formula / Function • T = 0.48O + 0.23TL + 0.5D 2.Decision Tree 3. Correlation or Association 4.Rule • IF IPS3=2.8 THEN graduate_ontime 5.Cluster Output / Pattern / Model / Knowledge
  40. 40. Learning Illustration A BA B A B A B A B A B A B A B A B Data -> Two Possible Solutions 1 2
  41. 41. •It is based on a labeled training set. •The class of each piece of data in training set is known. •Class labels are pre-determined and provided in the training phase. Supervised Learning A B A B A B e Class l Class l Class l Class e Class e Class “What is the class of this data point?” Task performed : classification, pattern recognition
  42. 42. Supervised Learning •Prediction methods are commonly referred to as supervised learning. Supervised methods are thought to attempt the discovery of the relationships between input attributes and a target attribute. •A training set is given and the objective is to form a description that can be used to predict unseen examples.
  43. 43. Supervised Learning Problems : • ClassiNication • The domain of the target attribute is Ninite and categorical. • A classiNier must assign a class to a unseen example. • Regression • The target attribute is formed by inNinite values. • To Nit a model to learn the output target attribute as a function of input attributes. • Time Series Analysis • Making predictions in time.
  44. 44. Supervised Learning
  45. 45. Supervised Learning Class/Label/TargetAttribute/Feature Nominal Numerik
  46. 46. Unsupervised Learning •Input : set of patterns P, from n-dimensional space S, but little or no information about their classiNication, evaluation, interesting features, etc. It must learn these by itself! : ) •Tasks: - Clustering - Group patterns based on similarity - Vector Quantisation - Fully divide up S into a small set of regions (deNined by codebook vectors) that also helps cluster P. - Feature Extraction - Reduce dimensionality of S by removing unimportant features (i.e. those that do not help in clustering P) • There is no supervisor and only input data is available. • The aim is now to Nind regularities, irregularities, relationships, similarities and associations in the input.
  47. 47. Unsupervised Learning Problems : • Clustering • Association Rules • Pattern Mining • It is adopted as a more general term than frequent pattern mining or association mining • Outlier Detection • It is the process of Ninding data which have very different behaviour from the expectation (outliers or anomalies)
  48. 48. Unsupervised Learning
  49. 49. Unsupervised Learning Attribute/Feature
  50. 50. Background : • How to learn a new skill • Learning and intelligence • Interaction with environment • Goal-oriented learning • Agent – Environment interactions • Activities - What to do - How to map situations to actions - Process positive and negative rewards Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximise some notion of cumulative reward. Basic reinforcement is modeled as a Markov decision process, and are often stochastic process
  51. 51. The Analogy • A child learns to walk • The child is an agent trying to manipulate the environment • The child is taking actions (state 1, state 2, state 3, and so on) • Positive rewards when able to walk • Negative rewards when not able to walk Reinforcement Learning
  52. 52. Reinforcement Learning Various Practical applications of Reinforcement Learning • RL can be used in robotics for industrial automation. • RL can be used in machine learning and data processing • RL can be used to create training systems that provide custom instruction and materials according to the requirement of students.
  53. 53. Data Preparation (CRISP-DM) Data Preprocessing • Measures for data quality: A multidimensional view • Accuracy: correct or wrong, accurate or not • Completeness: not recorded, unavailable, … • Consistency: some modified but some not, … • Timeliness: timely update? • Believability: how trustable the data are correct? • Interpretability: how easily the data can be understood?
  54. 54. 1.Data Cleaning a. Fill in missing values b. Smooth noisy data c. Iden+fy or remove outliers d. Resolve inconsistencies 2.Data Reduc6on a. Dimensionality reduc+on b. Numerosity reduc+on c. Data compression 3.Data Transforma6on and Data Discre6sa6on a. Normalisa+on b. Concept hierarchy genera+on 4.Data integra6on a. Integra+on of mul+ple databases or files Data Preprocessing Task
  55. 55. Common Data Analytics Rules Tasks Descrip6on Algorithms Examples Classification Predict if data points belongs to one of the predefined classes. Prediction based on learning from known dataset. Decision tree, neural network Bucketing new customers into one of the known customer groups Regression Predict the numeric target label of a data point. Prediction based on learning from known dataset. Linear regression, logistic regression Estimating insurance premium Clustering Identify natural clusters within the data set based on inherit properties within data set. K-Means, density based clustering Finding customer segments in a company based on transaction and call data. Association Rules Identify relationships within an item set based on transaction data FP-Growth algorithm, Apriori Find cross-selling opportunities for a retailer based on transaction purchase history Anomaly Detection Predict if a data point is an outlier compared to other data point in the dataset Distance based, density based, Local Outlier Factor (LOF) Fraud transaction detection in credit cards
  56. 56. Estimation Customer Order Number of Traffic Light Distance Travel Time 1 3 3 3 16 2 1 7 4 20 3 2 4 6 18 4 4 6 8 36 ... 1000 2 4 2 12 Label Learning Model using Estimation Methods (Linear Regression) Travel Time = 0.48O + 0.23TL + 0.5D Knowledge Pizza Delivery Time
  57. 57. Predictions stock price dataset in time series format label prediction using Neural Network Learning prediction plot
  58. 58. ClassiNication NIM Gender Nilai UN Asal Sekolah IPS1 IPS2 IPS3 IPS 4 ... Lulus Tepat 10001 L 28 SMAN 2 3.3 3.6 2.89 2.9 Ya 10002 P 27 SMA DK 4.0 3.2 3.8 3.7 Tidak 10003 P 24 SMAN 1 2.7 3.4 4.0 3.5 Tidak 10004 L 26.4 SMAN 3 3.2 2.7 3.6 3.4 Ya ... ... 11000 L 23.4 SMAN 5 3.3 2.8 3.1 3.2 Ya label learning using C4.5 classification methods
  59. 59. input : golf playing recommendation output (rules) : If outlook = sunny and humidity = high then play = no
 If outlook = rainy and windy = true then play = no
 If outlook = overcast then play = yes
 If humidity = normal then play = yes
 If none of the above then play = yes output (tree) : ClassiNication
  60. 60. Clustering dataset without label learning using K-means clustering methods
  61. 61. Association learning using FP-Growth association methods
  62. 62. 1.Es+ma+on: - Linear Regression, Neural Network, Support Vector Machine, etc 2.Predic+on/Forecas+ng: - Linear Regression, Neural Network, Support Vector Ma chine, etc 3.Classifica+on: - Naive Bayes, K-Nearest Neighbor, C4.5, ID3, CART, Linear Discriminant Analysis, Logis+c Regression, etc 4.Clustering: - K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-Means, etc 5.Associa+on: - FP-Growth, A Priori, Coefficient of Correla+on, Chi Square, etc Algorithm in Data Analytics
  63. 63. Based on Information Theory, for example in Decision Tree model induced by the concept of entropy and information gain. Information-Based Learning Is it a man ?, Does the person wear glasses ?
  64. 64. Similarity-Based Learning Training Records Test Record Compute Distance Basic Idea => If it walks like a duck, quack like a duck, then it's probably a duck the best way to make a predictions is to simply look at what has worked well in the past and predict the same thing again. for examples k-NN and k-means algorithm similarity can be represent as distance (euclidean)
  65. 65. Probability-based prediction approaches are heavily based on Bayes’ Theorem Probability-Based Learning • A probabilistic framework for solving classiNication problems • Conditional Probability / Bayes Theorem )( )()|( )|( AP CPCAP ACP = • Given: • A doctor knows that meningitis causes stiff neck 50% of the time • Prior probability of any patient having meningitis is 1/50,000 • Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis? 0002.0 20/1 50000/15.0 )( )()|( )|( = ´ == SP MPMSP SMP
  66. 66. perform a search for a set of parameters for a parameterised model that minimises the total error across the predictions made by that model with respect to a set of training instances. For example: multivariable linear regression with gradient descent, support vector machine Error-Based Learning B1 B2 b11 b12 b21 b22 margin Linear Regression Find hyperplane maximizes the margin => B1 is better than B2 Support Vector Machine
  67. 67. Model Evaluation 1.Estimation: - Error: Root Mean Square Error (RMSE), MSE, MAPE, etc 2.Prediction/Forecasting - Error: Root Mean Square Error (RMSE) , MSE, MAPE, etc 3.ClassiNication: - Confusion Matrix: Accuracy - ROC Curve: Area Under Curve (AUC) 4.Clustering: - Internal Evaluation: Davies–Bouldin index, Dunn index, - External Evaluation: Rand measure, F-measure, Jaccard index, Fowlkes–Mallows index, Confusion matrix 5.Association: - Lift Charts: Lift Ratio - Precision and Recall (F-measure)
  68. 68. learning and evaluation process confusion matrix PREDICTED CLASS ACTUAL
 CLASS Class=Yes Class=No Class=Yes a b Class=No c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative) FNFPTNTP TNTP dcba da +++ + = +++ + =Accuracy cba a pr rp ba a ca a ++ = + = + = + = 2 22 (F)measure-F (r)Recall (p)Precision Model Evaluation evaluation metric
  69. 69. Model Evaluation • Learning curve shows how accuracy changes with varying sample size • Requires a sampling schedule for creating learning curve: - Arithmetic sampling - Geometric sampling • Effect of small sample size: - Bias in the estimate - Variance of estimate
  70. 70. Increase Coverage Experiment Dataset Accuracy 1 93% 2 91% 3 90% 4 93% 5 93% 6 91% 7 94% 8 93% 9 91% 10 90% Average Accuracy 92% Orange Box : k-subset (data tes+ng) K-Cross Validation
  71. 71. The Future ML Trends artificial neural network convolutional neural network deep learning
  72. 72. Social Media Analytics
  73. 73. WorkNlow Application Programming Interface (API) Crawling Process > Network Structure (Social Network Analysis) > Content Analysis (Text Analytics) Pattern Mining and Analytics Process
  74. 74. Social Network Analysis
  75. 75. Content Analysis
  76. 76. Text Network
  77. 77. First Topic Identified Topic Modelling •Topic modelling is a type of statistical modelling for discovering the abstract “topics” that occur in a collection of documents.. •LDA (Latent Dirichlet Allocation) is the most popular (and typically most effective) topic modelling technique
  78. 78. TOP BRAND ALTERNATIVE MEASUREMENT BASED ON CONSUMER NETWORK ACTIVITY Abstract: In Business Intelligence effort, the legacy methodology to measure product brand awareness use technique such as surveys, interviews, and questionnaires. This methodology requires expensive effort to collect data from respondent and takes considerably time to accomplish. The availability of Big Data in the form of social media interaction can benefit us. The conversation and user generated content from social media certainly can be used to measure brand awareness through consumer activity. We use Social Network Analysis methodology to measure the dynamic and evolution of brand conversations in social media. By comparing the network properties, we propose new alternative measurement methods of product brand awareness. Our proposed methodology is better adapted to large-scale conversational data in social media.This measurement will also enhance the current methodology by viewing consumer opinions as a whole network and not as separated individual. This study conducted via social networking conversations on Twitter using two industry case studies, they are mobile operators and mobile phone brands in Indonesia mobile phone rank mobile operator rank
  79. 79. A COMPARISON OF INDONESIA E-COMMERCE SENTIMENT ANALYSIS FOR MARKETING INTELLIGENCE EFFORT CASE STUDY : BUKALAPAK, TOKOPEDIA, ELEVENIA Abstract:The rapid growth of e-commerce market in Indonesia, making various e-commerce companies appear and there has been high competition among them. Marketing intelligence is important activity to measure competitive position. One element of marketing intelligence is to assess customer satisfaction. Many Indonesian customers express their sense of satisfaction or dissatisfaction towards the company through social media. Hence, using social media data, it provides a new practical way to measure marketing intelligent effort.This research performs sentiment analysis using naive bayes classifier classification method withTF-IDF weighting.We compare the sentiments towards of top-3 e-commerce sites visited companies, they are Bukalapak,Tokopedia and Elevenia.We useTwitter data for sentiment analysis because it's faster, cheaper and easier from both the customer and the researcher side.The purpose of this research is to find out how to process the huge customer sentimentTwitter to become useful information for the e-commerce company, and which of those top-3 e-commerce companies has the highest level of customer satisfaction. From the experiment results, it shows the method can be used to classify customer sentiments in social mediaTwitter automatically and Elevenia is the highest e- commerce with customer satisfaction COMPARABLE RESULT AMONG THREE CASE STUDY
  80. 80. NETWORK TEXT ANALYSIS TO SUMMARISE ONLINE CONVERSATIONS FOR MARKETING INTELLIGENCE EFFORTS IN TELECOMMUNICATION INDUSTRY Abstract - Market tight competition put pressure the companies to employ a new and faster way to support their marketing intelligence effort.The need of marketing intelligence includes gathering and analysing data for confident decision making about market and its competition.Today, the abundant large scale data from online social network services has made possible to extract valuable information such as user opinions and sentiment from the conversations in the market.As the competition arise, new challenge emerged, which include faster data summarisation.The common practice of summarise contents is using wordcloud or weighted list of appearance words. This approach is lack of sense and contextual relations between words in questions, because the words has no connection with other words that might construct an important phrase.With the help of graph formulation, we propose a methodology of network text analysis to summarise large conversation in online social network services. This proposed methodology capture complex relations between words, while still maintain fast summarisation. In this paper, we compare three major telecommunication provider in Indonesia, which is Telkomsel, XL and Indosat.The conversations about those brands in online social network services Twitter is collected, Network text about each brands are constructed and analysed.
  81. 81. NETWORK MARKET ANALYSIS USING LARGE SCALE SOCIAL NETWORK CONVERSATION OF INDONESIA FAST FOOD INDUSTRY Abstract - The high competitiveness of the Indonesia Fast Food market has forced the industry to find the new way to understand market behaviour. The new challenge should include faster data collection and analytical process, preferably time delivery needed close to real-time. The common practice of gathering market data using questionnaires and interviews are considered expensive and time-consuming process compared to mining online conversation with brand community respected. With the availability of large-scale data from online social network services (oSNS), we can extract valuable information represent dynamic behaviour of the market. Many brands have their presence in oSNS as a part of their customer relationship management (CRM) effort. The social interactions formed in oSNS can be modeled using Social Network Analysis (SNA) methodology. In this paper, we compare two brand communities of head to head competitive product in the fast food industry, they are McDonald’s and Burger King. The SNA model constructs large-scale network, its size, reaching close to a million of nodes and edges. The result will give us insight about what is important in understanding the dynamic market beside the market size represented by the community conversations.
  82. 82. SOCIAL NETWORK AND SENTIMENT ANALYSIS FOR SOCIAL CUSTOMER RELATIONSHIP MANAGEMENT IN INDONESIA BANKING SECTOR SCRM Network BCA BNI MANDIRI Abstract - The increasing number of social media users affects both individual and corporation user. Banking sector, for example, use social media to support their Social Customer Relationship Management activity. We investigate the dynamics and evolution of conversation network between bank customer using Social Network Analysis methodology. Measurement is conducted by calculating its network properties to see the characteristic and how active the network is. Customers talking about banks’ services can also express their opinion on social media. Therefore we perform sentiment analysis to classify customer’s opinion into positive, negative and neutral class. This research was performed on Twitter’s conversation about Bank Mandiri, Bank Central Asia (BCA) and Bank Negara Indonesia (BNI). The result of this research is beneficial for business intelligence purpose to support decision making.
  83. 83. MEASURING MARKETING COMMUNICATIONS MIX EFFORT USING MAGNITUDE OF INFLUENCE AND INFLUENCE RANK METRIC Abstract: In the context of modern marke:ng, Twi>er is considered as a communica:on pla@orm to spread informa:on. Many companies create and acquire several Twi>er accounts to support and perform varie:es of marke:ng mix ac:vi:es. Ini:ally, each accounts used to capture specific market profile. Together, the accounts create network of informa:on that provide consumer to the informa:on they need depends on their contextual u:lisa:on. From many accounts available, we have the fundamental ques:on on how to measure influence of each account in the market based not only their rela:ons, but also the effects of their pos:ngs. Magnitude of Influence (MOI) metric is adapted together with Influence Rank (IR) measurement of accounts in their social network neighbourhood. We use social network analysis approach to analyse 65 accounts in the social network of an Indonesian mobile phone network operator, Telkomsel which involved in marke:ng communica:ons mix ac:vi:es through series of related tweets. Using social network provide the idea of the ac:vity in building and maintaining rela:onships with the target audience. This paper shows the results of the most poten:al accounts based on the network structure and engagement. Based on this research, the more number of followers one account has, the more responsibility it has to generate the interac:on from their followers in order to achieve the expected effec:veness. The focus of this paper is to determine the most poten:al accounts in the applica:on of marke:ng communica:ons mix in Twi>er. ratio of affection magnitude of influence LCRT function influence rank (based on pagerank)
  84. 84. MAPPING ONLINE TRANSPORTATION SERVICE QUALITY AND MULTI-CLASS CLASSIFICATION PROBLEM SOLVING PRIORITIES CASE STUDY : GOJEK AND GRAB Abstract. Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2% positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end users.
  85. 85. HYBRID SENTIMENT AND NETWORK ANALYSIS OF SOCIAL OPINION POLARIZATION Abstract: The rapid growth of social media and user generated contents (UGC) has provided a rich source of poten:ally relevant data. The problems arise on how to summarise those data to understand and transforming it into informa:on. Twi>er as one of the most popular social networking and micro- blogging service can be analysed in terms of content produced with sen:ment analysis. On the other hand, some types of networks can also be constructed to analyse the social network structure and network proper:es. This research intended to combine those content and structural approaches into hybrid approach for iden:fies social opinion polarisa:on, this is in the form of conversa:on network. Sen:ment analysis used to determine public sen:ment, and social network analysis used to analyse the structure of the network, detec:ng communi:es and influen:al actors in the network. Using this hybrid approach, we have comprehensive understanding about social opinion polarisa:on. As case study, we present real social opinion polarisa:on about reclama:on issue in Indonesia.
  86. 86. DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELLING Case Study: Uber Digital flows now exert a larger impact, the world is now more connected than ever, the amount of cross-border bandwidth that used has grown 45 times larger since 2005. With the massive amount of data spreading in the net, including social media, speed is one most essential factor in business. companies can take advantage of social media as a source to analyse and extract the customer’s opinion, and therefore the company can have quick response towards the condition. The main purpose of this research is content analysis, to obtain the goal, we need to extract the information as well as summarise the topic inside it. However, in order to analyse the content quickly, there are varies choice of tools with its specific output that creates challenges in the process. We use Naïve Bayes Sentiment Analysis based on time-series, specifically on daily basis and topic modeling based on Latent Dirichlet Allocation (LDA) to evaluate the sentiment of the topic as well as the model of the topics discussed. The purpose of this research is to help both companies and individuals to map the public opinion towards certain topic by analyzing the sentiment of the text and create a topic model. Therefore, a real-time information for determining the consumer opinion become a crucial part. Twitter can serve the purpose as one source of real-time information from user-generated content. We pick Uber as the case study, viewed as one of the most favored transportation methods in most part of the world. Data collection period is from 10th February 2017 until 28th February 2017 with 1.048.576 tweets collected.
  87. 87. ANALYSING EMPLOYEE VOICE USING REAL-TIME FEEDBACK Abstract People nowadays tend to use social media as a platform to share their reviews, emotions, and opinions, including about their jobs. Thus, a lot of data is available on the web. Therefore, a rapid response is needed to analyse and interpret the data. Unfortunately, many organisations still use annual surveys to assess satisfaction, engagement, and culture in the workplace. Compared to other conventional datasets such as company survey and questionnaire, decision-makers could make decision effectively and efficiently by using the interpreted data. This may be done with the help of sentiment analysis method. In this research, we classify the feedback based on its category and sentiment. Several classification algorithms are used in opinion mining, two of them are Naive Bayes Classifier (NBC) and Support Vector Machine (SVM). This paper aims to classify feedback based on sentiments using NBC and SVM. *ICST, 2018
  88. 88. MONTE CARLO SIMULATION AND CLUSTERING FOR CUSTOMER SEGMENTATION IN BUSINESS ORGANISATION Abstract: U:lising data for segmenta:on analysis can bring a streamlined way to get poten:al insight as of decision making support in a business organisa:on. Using appropriate data analy:cal technique help the organisa:ons in profiling their customer segments accurately. The result brings an effec:ve marke:ng strategy. However, there are :mes in doing data analy:c, the organisa:on needs another variable of data where the value is unavailable, for example: customer’s income data which mostly hard to collect. By using Monte Carlo simula:on, the value of customer’s income can be generated and then compared with customer spending to construct customer segmenta:on model. An unsupervised learning for customer segmenta:on model using K-Means clustering enables us to see the grouping pa>erns of customer’s income towards their spending. Clusters of the dataset might be interpreted as a group of customers that having a similar character. This paper shows us how to generate customer’s income data and create data cluster to op:mising customer poten:al by u:lising data. Furthermore, the result brings us insight into which group off the customer might unserved properly considering their average income with their spending behaviour.
  89. 89. MAPPING ORGANISATION KNOWLEDGE NETWORK AND SOCIAL MEDIA BASED REPUTATION MANAGEMENT Abstract—Knowledge management and reputation are important aspects in an organization, especially in ICT industry. Controlling knowledge management and modeling personal reputation through social media is essentials for the organization because we can see how employee build their relationship around their peer networks or clients virtually and how knowledge network can support organization performance. The purpose of this research is to map knowledge network and reputation formulation in order to fully understand how knowledge flow in an organization and whether employee reputation have higher degree of influence in organization knowledge network. We particularly develop formulas to measure knowledge network and personal reputation based on their social media activities. As case study, we pick an Indonesian ICT company which actively build their business around their employee peer knowledge outside the company. For knowledge network, we perform data collection by conducting interviews. For reputation management, we crawl data from several popular social media. We base our work on Social Network Analysis methodology. The result shows that employees knowledge is directly proportional with their reputation, but there are different reputations level on different social media observed in this research. reputation formula for twitter, instagram and linkedin
  90. 90. PREDICTION MODELS BASED ON FLIGHT TICKETS AND HOTEL ROOMS DATA SALES FOR RECOMMENDATION SYSTEM IN ONLINE TRAVELAGENT BUSINESS Abstract - Indonesia as one of the favorite vacation destinations of domestic and foreign travelers made the value of investment in the tourism industry continued to grow significantly. This was created more Online Travel Agent business in recent years. However, it made a lot of business travel and Umrah travel in Indonesia is threatened with bankruptcy, after the online travel business activity is rampant in conventional business market ticket sales and travel tours. The research case study is different from the Online Travel Agent business in general, because it worked in real-time analytic using flight tickets and hotel rooms sales data to create prediction or recommendation model. Data mining, extraction of hidden predictive information from large databases, was a powerful technique with great potential to help companies focus on the most important information in their data warehouse. By using classification method in data mining, the objectives of this paper is to create predictive models from flight tickets and hotel rooms sales data using the decision tree classification approach. The result of this paper is beneficial for business that can be used as basic algorithm for programming in Online Travel Agent recommendation feature.
  91. 91. EFFECTIVE KNOWLEDGE MANAGEMENT USING BIG DATAAND SOCIAL NETWORK ANALYSIS Vizualisa+on of hierarchical structure organiza+on and knowledge flow of informal organiza+on Abstract: Knowledge management consists of iden+fying, crea+ng, represen+ng, distribu+ng, and enabling adop+on of insights and experiences in an organiza+on. One approach of modeling knowledge management is using network model. Big Data is one of important ICT technological roadmap, which main func+on is modelling behaviour and helping organiza+on decision support. Social Network Analysis is a micro version of Big Data where we can model and establish social network quan+fica+on. In this paper we will show how Social Network Analysis can help organiza+on applying Knowledge Management strategies and prac+ces by experiment using real-world large dataset contains 360000+ email exchanges between 36000+ employees inside in an organiza+on business case resolved using SNA methodology map of full network emaile xchange between employes in Enron
  92. 92. INDONESIA INFRASTRUCTURE AND CONSUMER STOCK PORTFOLIO PREDICTION USING ARTIFICIAL NEURAL NETWORK BACKPROPAGATION *ICOICT, 2017 Abstract: Ar:ficial Neural Network (ANN) method is increasingly popular to build predic:ve model that generated small error predic:on. To have a good model, ANN needs large dataset as an input. ANN backpropaga:on is a gradient decrease method to minimize the output error squared. Stock price movements are suitable with ANN requirement : it is a large data set because stock price is recorded up to every seconds, usually called high frequency data. The implementa:onof stock price predic:on using ANN approach is quite new. The predic:ve model help investor in building stock por@olio and their decision making process. Buying some stocks in por@olio decrease diversified risk and increases the chance of higherreturn.In this paper, we show how to generate predic:on model using ar:ficial neural network backpropaga:on of stock price and forming por@olio with predicted price that bring predic:on of the por@olio with the smallest error. The data set we use is historical stock price data from ten different company stocks of infrastructure and consumer sector Indonesia Stock Exchage. The results is for lower risk condi:on, ANN predic:ve model gives higher expected return than the return from real condi:on, while for higher risk, the return from the real condi:on is higherthan the ANN predic:ve model.
  93. 93. THE DYNAMIC OF BANKING NETWORK TOPOLOGY Case Study: Indonesian Presidential Election Event ABSTRACT - Information and communication technologies have brought major changes in data storage and processing. Various types and high volume of data has been digitalised and support mining-based data processing to provide knowledge in a modern and efficient way. Banking transaction data has been stored digitally and suitable for the mining process especially in network science model.Understanding transaction system risk requires fundamental study on payments flow and bank behaviour in various situations. Lehman Brother’s failure spread contagion impact in a short time indicates that financial markets have interdependent properties and connected to each other in a large network. Thus, overall system network approach becomes more important than a single bank. Political conditions greatly affect economic stability including the banking and financial sectors. Presidential election is a major political event for a nation. This affected on community sentiment and financial market. However, the linkage between political events and topological changes is poorly understood.This research presents an insight of the event driven dynamic network topology with banking transaction as a case study. We search for the banking transaction network topology dynamic driven by 2014 Indonesian presidential election event. We discover that banks are more engaged to others in larger value 3 days before the end of campaign period and less engaged to others in smaller value in the end of campaign period. Unique transaction activity between banks remain stable with low declination in the end of campaign period. This scenario provides the possibility to learn the banking transaction pattern and support the financial system stability supervision.
  94. 94. A COMPARATIVE STUDY OF EMPLOYEE CHURN PREDICTION MODEL Abstract - Churn phenomenon commonly occurs in customer loyalty towards brand product or services. They becomes critical issue that any industry would make best effort to avoid. Churn problem may arise within the organisation, called employee churn. Employee churn creates myriad and adverse effects to the organisation as it correlates with unfairly workload distribution, great deal of money lost and also extra time needed to find a replace, which may result in the rise of customer dissatisfaction rate. The purpose of this study is to find the best model to predict employee churn. A successful prediction model for employee churn is significantly needed in order to avert various negative impacts for the organisation. There are three popular classification models for prediction, namely naïve bayes, decision tree, and random forest. This study compares performance of the aforementioned models by using Human Resource Information System (HRIS) from one of Indonesia’s renowned telecommunication company. The data collected for the study spans for 2 years period, started from 2015 until 2017. The findings from the study suggest that the best classification model is random forest due to its immense accuracy of 97.5%. The second-best method is naïve bayes with 96.6%, and the lowest accuracy of classification model is decision tree with 88.7%. The study concludes that the most reliable and accurate classification model to predict employee churn is random forest
  95. 95. Conclusion
  96. 96. STATISTICS DATA ANALYTICS Confirmative Explorative Small Data Set Larga Data Set Small Number of Variable Large Number of Variable Deductive (no predictions) Inductive Numeric Data Numeric and Non-Numeric Data Clean Data Data Cleaning Complimentary Methods
  97. 97. • big data provide granular, micro data • big data provide relatively fast and cheap process • research opportunity on data science methods, implementation and evaluation maturity • data scientist helps big data initiatives towards future and sustainable economic activities • uncovering hidden truths, democratisation by data, are primary objective of data scientist • hard to Nind data scientist talent • high cost to maintain data scientist talent .. • big data often populations study, so no sampling error => methods familiarity • beneNit > data quality + costs + security • ML result credibility (different algorithm, different conclusion) CHALLENGES Opportunities Challenges
  98. 98. Data Visualisation
  99. 99. The Power of Data is … every breath you take every move you make every bond you break every step you take l’ll be watching you
  100. 100. without Big Data, you are blind and deaf in the middle of a freeway - Geoffrey Moore -
  • ShannonViljoen2

    Nov. 26, 2021
  • hendramarcos3

    Oct. 20, 2020
  • AmirHamzah76

    Aug. 10, 2020
  • agungnursilo

    Aug. 9, 2020
  • wqudimat

    Jun. 6, 2020
  • BaseethShaik

    Mar. 8, 2020
  • ruarbiasa

    Mar. 5, 2020
  • AldrinANoer

    Mar. 3, 2020
  • HamzahKamaruzaman

    Feb. 28, 2020
  • SikeliTalala

    Jan. 16, 2020
  • ssuser1432fc

    Dec. 31, 2019
  • muskanbrijesh

    Dec. 14, 2019
  • SelukGirayzdamar

    Dec. 2, 2019
  • Vani869

    Nov. 30, 2019
  • jamalalqhaiwi

    Nov. 27, 2019
  • ApuMadhusudan

    Nov. 18, 2019
  • dharwiyanti

    Nov. 17, 2019
  • RizalPurwosaputro1

    Nov. 15, 2019
  • JamesScott302

    Nov. 14, 2019
  • estananto

    Nov. 14, 2019

Big Data Analytics Presentation at International Workshop Colloquium Exploring Research Opportunity. School of Business and Management (SBM) - ITB. Bandung, 8 August 2019.

Views

Total views

1,644

On Slideshare

0

From embeds

0

Number of embeds

12

Actions

Downloads

0

Shares

0

Comments

0

Likes

24

×