SlideShare une entreprise Scribd logo
1  sur  78
Télécharger pour lire hors ligne
Apache	Hivemall:	Scalable	machine	learning	
library	for	Apache	Hive/Spark/Pig
1).	Research	Engineer,	Treasure	Data
Makoto	YUI	@myui
2017/5/16	Apache	BigData North	America	'17,	Miami
2).	Research	Engineer,	NTT
Takashi	Yamamuro @maropu
@ApacheHivemall
Plan	of	the	talk
1. Introduction	to	Hivemall
2. Hivemall	on	Spark
2017/5/16	Apache	BigData	North	America	'17,	Miami
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	entered	Apache	Incubator	
on	Sept	13,	2016	 🎉
Since	then,	we	invited	2	contributors	as	new	committers	(a	
committer	has	been	voted	as	PPMC). Currently,	we	are	working	
toward	the	first	Apache	release	on	this	Q2.
hivemall.incubator.apache.org
What	is	Apache	Hivemall
Scalable	machine	learning	library	built	
as	a	collection	of	Hive	UDFs
2017/5/16	Apache	BigData	North	America	'17,	Miami
Multi/Cross	
platform VersatileScalableEase-of-use
Hivemall	is	easy	and	scalable	…
ML	made	easy	for	SQL	developers
Born	to	be	parallel	and	scalable
2017/5/16	Apache	BigData	North	America	'17,	Miami
Ease-of-use
Scalable
CREATE	TABLE	lr_model AS
SELECT
feature,	-- reducers	perform	model	averaging	in	parallel
avg(weight)	as	weight
FROM	(
SELECT	logress(features,label,..)	as	(feature,weight)
FROM	train
)	t	-- map-only	task
GROUP	BY	feature;	-- shuffled	to	reducers
This	query	automatically	runs	in	parallel	on	Hadoop
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	is	a	multi/cross-platform ML	library
HiveQL SparkSQL/Dataframe API Pig	Latin
Hivemall	is	Multi/Cross	platform	..
Multi/Cross	
platform
prediction	models	built	by	Hive	can	be	used	from	Spark,	and	conversely,	
prediction	models	build	by	Spark	can	be	used	from	Hive
Hadoop	HDFS
MapReduce
(MRv1)
Hivemall
Apache	YARN
Apache	Tez
DAG	processing
Machine Learning
Query Processing
Parallel Data
Processing Framework
Resource Management
Distributed File System
Cloud Storage
SparkSQL
Apache	Spark
MESOS
Hive Pig
MLlib
Hivemall’s Technology	Stack
2017/5/16	Apache	BigData	North	America	'17,	Miami
Amazon	S3
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	on	Apache	Hive
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	on	Apache	Spark	Dataframe
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	on	SparkSQL
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	on	Apache	Pig
2017/5/16	Apache	BigData	North	America	'17,	Miami
Online	Prediction	by	Apache	Streaming
2017/5/16	Apache	BigData	North	America	'17,	Miami
Versatile
Hivemall	is	a	Versatile	library	..
ü Not	only	for	Machine	Learning
ü provides	a	bunch	of	generic	utility	functions
Each	organization	has	own	sets	of	
UDFs	for	data	preprocessing
Don’t	Repeat	Yourself!
Don’t	Repeat	Yourself!
2017/5/16	Apache	BigData	North	America	'17,	Miami
Hivemall	generic	functions
Array	and	Map Bit	and	compress String	and	NLP
We	welcome	contributing	your	generic	UDFs	to	Hivemall!
Geo	Spatial
Top-k	processing
> TF/IDF
> TILE
> MAP_URL
2017/5/16	Apache	BigData	North	America	'17,	Miami
Map	tiling	functions
2017/5/16	Apache	BigData	North	America	'17,	Miami
Tile(lat,lon,zoom)
= xtile(lon,zoom) + ytile(lat,zoom) * 2^n
Map	tiling	functions
Zoom=10
Zoom=15
2017/5/16	Apache	BigData	North	America	'17,	Miami
student class score
1 b 70
2 a 80
3 a 90
4 b 50
5 a 70
6 b 60
Top-k	query	processing
student class score
3 a 90
2 a 80
1 b 70
6 b 60
List	top-2	students	for	each	class
2017/5/16	Apache	BigData	North	America	'17,	Miami
student class score
1 b 70
2 a 80
3 a 90
4 b 50
5 a 70
6 b 60
Top-k	query	processing
List	top-2	students	for	each	class
SELECT	*	FROM	(		
SELECT		
*,	
rank()	over	(partition	by	class	order	by	score	desc)
as	rank
FROM	table
)	t
WHERE	rank	<=	2
RANK	over()	query	does	not	finishes	in	24	hours L
where	20	million	MOOCs	classes	and	avg	1,000	students	in	each	classes
2017/5/16	Apache	BigData	North	America	'17,	Miami
student class score
1 b 70
2 a 80
3 a 90
4 b 50
5 a 70
6 b 60
Top-k	query	processing
List	top-2	students	for	each	class
SELECT	
each_top_k(
2,	class,	score,	
class,	student		
)	as	(rank,	score,	class,	student)
FROM (
SELECT	*	FROM	table
DISTRIBUTE	BY	class	SORT	BY	class
)	t
EACH_TOP_K	finishes	in	2	hours J
2017/5/16	Apache	BigData	North	America	'17,	Miami
Top-k	query	processing	by	RANK	OVER()
partition	by	class
Node	1
Sort	by	class,	score
rank	over()
rank	>=	2
2017/5/16	Apache	BigData	North	America	'17,	Miami
Top-k	query	processing	by	EACH_TOP_K
distributed	by	class
Node	1
Sort	by	class
each_top_k
OUTPUT	only	K	
items
2017/5/16	Apache	BigData	North	America	'17,	Miami
Comparison	between	RANK	and	EACH_TOP_K
distributed	by	class
Sort	by	class
each_top_k
Sort	by	class,	score
rank	over()
rank	>=	2
SORTING IS	HEAVY
NEED	TO	
PROCESS	ALL
OUTPUT	only	K	
items
Each_top_k is	very	efficient	where	the	number	of	class	is	large
Bounded	Priority	Queue
is	utilized
List	of	Supported	Algorithms
Classification	
✓ Perceptron
✓ Passive	Aggressive	(PA,	PA1,	PA2)
✓ Confidence	Weighted	(CW)
✓ Adaptive	Regularization	of	Weight	
Vectors	(AROW)
✓ Soft	Confidence	Weighted	(SCW)
✓ AdaGrad+RDA
✓ Factorization	Machines
✓ RandomForest	Classification
2017/5/16	Apache	BigData	North	America	'17,	Miami
Regression
✓Logistic	Regression	(SGD)
✓AdaGrad (logistic	loss)
✓AdaDELTA (logistic	loss)
✓PA	Regression
✓AROW	Regression
✓Factorization	Machines
✓RandomForest	Regression
SCW is a good first choice
Try RandomForest if SCW does not
work
Logistic regression is good for getting a
probability of a positive class
Factorization Machines is good where
features are sparse and categorical ones
RandomForest	in	Hivemall
Ensemble	of	Decision	Trees
2017/5/16	Apache	BigData	North	America	'17,	Miami
Training	of	RandomForest
2017/5/16	Apache	BigData	North	America	'17,	Miami
Prediction	of	RandomForest
2017/5/16	Apache	BigData	North	America	'17,	Miami
Supported	Algorithms	for	Recommendation
2017/5/16	Apache	BigData	North	America	'17,	Miami
K-Nearest	Neighbor
✓ Minhash and	b-Bit	Minhash
(LSH	variant)
✓ Similarity	Search	on	Vector	
Space
(Euclid/Cosine/Jaccard/Angular)
Matrix	Completion
✓ Matrix	Factorization
✓ Factorization	Machines	
(regression)
each_top_k function	of	Hivemall	is	useful	for	
recommending	top-k	items
Other	Supported	Algorithms
2017/5/16	Apache	BigData	North	America	'17,	Miami
Feature	Engineering
✓Feature	Hashing
✓Feature	Scaling
(normalization,	z-score)	
✓ Feature	Binning	
✓ TF-IDF	vectorizer
✓ Polynomial	Expansion
✓ Amplifier
NLP
✓Basic	Englist text	Tokenizer	
✓Japanese	Tokenizer
Evaluation	metrics
✓AUC,	nDCG,	logloss,	precision	
recall@K,	and	etc
2017/5/16	Apache	BigData	North	America	'17,	Miami
Feature	Engineering	– Feature	Hashing
2017/5/16	Apache	BigData	North	America	'17,	Miami
Feature	Engineering	– Feature	Hashing
2017/5/16	Apache	BigData	North	America	'17,	Miami
Feature	Engineering	– Feature	Binning
Maps	quantitative	variables	to	fixed	number	of	
bins	based	on	quantiles/distribution
Map	Ages	into	3	bins
2017/5/16	Apache	BigData	North	America	'17,	Miami
Feature	Selection	– Signal	Noise	Ratio
2017/5/16	Apache	BigData	North	America	'17,	Miami
Evaluation	Metrics
Other	Supported	Features
2017/5/16	Apache	BigData	North	America	'17,	Miami
Anomaly	Detection
✓Local	Outlier	Factor	(LoF)
✓ChangeFinder
Clustering	/	Topic	models
✓Online	mini-batch	LDA
✓Online	mini-batch	PLSA
Change	Point	Detection
✓ChangeFinder
✓Singular	Spectrum	
Transformation
Efficient	algorithm	for	finding	change	point	and	outliers	from	
timeseries data
2017/5/16	Apache	BigData	North	America	'17,	Miami
J.	Takeuchi	and	K.	Yamanishi,	“A	Unifying	Framework	for	Detecting	Outliers	and	Change	Points	from	Time	Series,” IEEE		transactions	on	
Knowledge	and	Data	Engineering,	pp.482-492,	2006.
Anomaly/Change-point	Detection	by	ChangeFinder
Take	this…
Anomaly/Change-point	Detection	by	ChangeFinder
2017/5/16	Apache	BigData	North	America	'17,	Miami
2017/5/16	Apache	BigData	North	America	'17,	Miami
Anomaly/Change-point	Detection	by	ChangeFinder
…and	do	this!
Efficient	algorithm	for	finding	change	point	and	outliers	from	
timeseries data
2017/5/16	Apache	BigData	North	America	'17,	Miami
Anomaly/Change-point	Detection	by	ChangeFinder
J.	Takeuchi	and	K.	Yamanishi,	“A	Unifying	Framework	for	Detecting	Outliers	and	Change	Points	from	Time	Series,” IEEE		transactions	on	
Knowledge	and	Data	Engineering,	pp.482-492,	2006.
Efficient	algorithm	for	finding	change	point	and	outliers	from	
timeseries data
2017/5/16	Apache	BigData	North	America	'17,	Miami
Anomaly/Change-point	Detection	by	ChangeFinder
J.	Takeuchi	and	K.	Yamanishi,	“A	Unifying	Framework	for	Detecting	Outliers	and	Change	Points	from	Time	Series,” IEEE		transactions	on	
Knowledge	and	Data	Engineering,	pp.482-492,	2006.
2017/5/16	Apache	BigData	North	America	'17,	Miami
• T.	Ide	and	K.	Inoue,	"Knowledge	Discovery	from	Heterogeneous	Dynamic	Systems	using	Change-Point	
Correlations",	Proc.	SDM,	2005T.	
• T.	Ide	and	K.	Tsuda,	"Change-point	detection	using	Krylov	subspace	learning",	Proc.	SDM,	2007.
Change-point	detection	by	Singular	Spectrum	Transformation
2017/5/16	Apache	BigData	North	America	'17,	Miami
Online	mini-batch	LDA
ü XGBoost Integration
ü Sparse	vector	support	in	RandomForests
ü Docker	support	(for	evaluation)
ü Field-aware	Factorization	Machines
ü Generalized	Linear	Model
• Optimizer	framework	including	ADAM
• L1/L2	regularization
2017/5/16	Apache	BigData	North	America	'17,	Miami
Other	new	features	in	development
Copyright©2016 NTT corp. All Rights Reserved.
Hivemall on
44Copyright©2016 NTT corp. All Rights Reserved.
Who am I
• Takeshi Yamamuro
• NTT corp. in Japan
• OSS activities
• Apache Hivemall PPMC
• Apache Spark contributer
45Copyright©2016 NTT corp. All Rights Reserved.
Whatʼs Spark?
• Distributed data analytics engine,
generalizing Map Reduce
46Copyright©2016 NTT corp. All Rights Reserved.
Whatʼs Spark?
• 1. Unified Engine
• support end-to-end APs, e.g., MLlib and Streaming
• 2. High-level APIs
• easy-to-use, rich optimization
• 3. Integrate broadly
• storages, libraries, ...
47Copyright©2016 NTT corp. All Rights Reserved.
• Hivemall wrapper for Spark
• Wrapper implementations for DataFrame/SQL
• + some utilities for easy-to-use in Spark
• The wrapper makes you...
• run most of Hivemall functions in Spark
• try examples easily in your laptop
• improve some function performance in Spark
Whatʼs Hivemall on Spark?
48Copyright©2016 NTT corp. All Rights Reserved.
• Hivemall already has many fascinating ML
algorithms and useful utilities
• + High barriers to add newer algorithms in MLlib
Whyʼs Hivemall on Spark?
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
49Copyright©2016 NTT corp. All Rights Reserved.
• Most of Hivemall functions supported in Spark
v2.0 and v2.1
Current Status
- To compile for Spark v2.1
$ git clone https://github.com/apache/incubator-hivemall
$ cd incubator-hivemall
$ mvn package –Pspark-2.1–DskipTests
$ ls target/*spark*
target/hivemall-spark-2.1_2.11-XXX-with-dependencies.jar
...
50Copyright©2016 NTT corp. All Rights Reserved.
• Most of Hivemall functions supported in Spark
v2.0 and v2.1
Current Status
- To compile for Spark v2.0
$ git clone https://github.com/apache/incubator-hivemall
$ cd incubator-hivemall
$ mvn package –Pspark-2.0–DskipTests
$ ls target/*spark*
target/hivemall-spark-2.0_2.11-XXX-with-dependencies.jar
...
51Copyright©2016 NTT corp. All Rights Reserved.
• 1. Fetch training and test data
• 2. Load these data in Spark
• 3. Build a model
• 4. Do predictions
4 Step Example
52Copyright©2016 NTT corp. All Rights Reserved.
• E2006 tfidf regression dataset
• http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/dat
asets/regression.html#E2006-tfidf
1. Fetch training and test data
$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
regression/E2006.train.bz2
$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
regression/E2006.test.bz2
53Copyright©2016 NTT corp. All Rights Reserved.
2. Load data in Spark
// Download Spark-v2.1 and launch a spark-shell with Hivemall
$ <HIVEMALL_HOME>/bin/spark-shell
// Create DataFrame from the bzipʼd libsvm-formatted file
scala> val trainDf = spark.read.format("libsvm”).load(“E2006.train.bz2")
scala> trainDf.printSchema
root
|-- label: double (nullable = true)
|-- features: vector (nullable = true)
54Copyright©2016 NTT corp. All Rights Reserved.
2. Load data in Spark (Detailed)
0.000357499151147113 6066:0.0007932706219604
8 6069:0.000311377727123504 6070:0.0003067549
34580457 6071:0.000276992485786437 6072:0.000
39663531098024 6074:0.00039663531098024 6075
:0.00032548335…
trainDf
Partition1 Partition2 Partition3 PartitionN
…
…
…
Load in parallel because
bzip2 is splittable
55Copyright©2016 NTT corp. All Rights Reserved.
3. Build a model - DataFrame
scala> paste:
val modelDf = trainDf.train_logregr($"features", $"label")
.groupBy("feature”)
.agg("weight" -> "avg”)
56Copyright©2016 NTT corp. All Rights Reserved.
3. Build a model - SQL
scala> trainDf.createOrReplaceTempView("TrainTable")
scala> paste:
val modelDf = sql("""
| SELECT feature, AVG(weight) AS weight
| FROM (
| SELECT train_logregr(features, label)
| AS (feature, weight)
| FROM TrainTable
| )
| GROUP BY feature
""".stripMargin)
57Copyright©2016 NTT corp. All Rights Reserved.
4. Do predictions - DataFrame
scala> paste:
val df = testDf.select(rowid(), $"features")
.explode_vector($"features")
.cache
# Do predictions
df.join(modelDf, df("feature") === model("feature"), "LEFT_OUTER")
.groupBy("rowid")
.avg(sigmoid(sum($"weight" * $"value")))
58Copyright©2016 NTT corp. All Rights Reserved.
4. Do predictions - SQL
scala> modelDf.createOrReplaceTempView(”ModelTable")
scala> df.createOrReplaceTempView(”TestTable”)
scala> paste:
sql("""
| SELECT rowid, sigmoid(value * weight) AS predicted
| FROM TrainTable t
| LEFT OUTER JOIN ModelTable m
| ON t.feature = m.feature
| GROUP BY rowid
""".stripMargin)
59Copyright©2016 NTT corp. All Rights Reserved.
• Top-K Join
• Join two relations and compute top-K for each group
Add New Features in Spark
scala> paste:
val topkDf = leftDf.join(rightDf, “group” :: Nil, “INNER”)
.select(leftDf("group"), (leftDf(“x”) + rightDf(“y”)).as(“score”)
.withColumn(
"rank",
rank().over(partitionBy($"group”).orderBy($"score".desc))
)
.where($"rank" <= topK)
Use Spark vanilla APIs
60Copyright©2016 NTT corp. All Rights Reserved.
• Top-K Join
• Join two relations and compute top-K for each group
Add New Features in Spark
scala> paste:
val topkDf = leftDf.join(rightDf, “group” :: Nil, “INNER”)
.select(leftDf("group"), (leftDf(“x”) + rightDf(“y”)).as(“score”)
.withColumn(
"rank", rank().over(partitionBy($"group”).orderBy($"score".desc))
)
.where($"rank" <= topK)
1. Join leftDf with rightDf
2. Sort data in each group
3. Filter top-K data
Use Spark vanilla APIs
61Copyright©2016 NTT corp. All Rights Reserved.
• Top-K Join
• Join two relations and compute top-K for each group
Add New Features in Spark
scala> paste:
val topkDf = leftDf.top_k_join(
lit(topK), rightDf, leftDf(“group”) === rightDf(“group”),
(leftDf(“x”) + rightDf(“y”)).as(“score”)
)
Use a fused API implemented in Hivemall
62Copyright©2016 NTT corp. All Rights Reserved.
• How top_k_join works?
Add New Features in Spark
group x
group y
leftDf
rightDf
・・・
63Copyright©2016 NTT corp. All Rights Reserved.
• How top_k_join works?
Add New Features in Spark
group x
group y
・・・
・・・
K-length
priority queue
Compute top-K rows
by using a priority queue
leftDf
rightDf
64Copyright©2016 NTT corp. All Rights Reserved.
• How top_k_join works?
Add New Features in Spark
group x
group y
・・・
・・・
K-length
priority queue
Compute top-K rows
by using a priority queue
Only join top-K rowsleftDf
rightDf
65Copyright©2016 NTT corp. All Rights Reserved.
• Codegenʼd top_k_join for fast processing
• Spark internally generates Java code from a built
physical plan, and compiles/executes it
Add New Features in Spark
Spark Planner (Catalyst) Overview
66Copyright©2016 NTT corp. All Rights Reserved.
• Codegenʼd top_k_join for fast processing
• Spark internally generates Java code from a built
physical plan, and compiles/executes it
Add New Features in Spark
Codegen
A Physical Plan
67Copyright©2016 NTT corp. All Rights Reserved.
• Codegenʼd top_k_join for fast processing
• Spark internally generates Java code from a built
physical plan, and compiles/executes it
Add New Features in Spark
scala> topkDf.explain
== Physical Plan ==
*ShuffledHashJoinTopK 1, [group#10], [group#27]
:- Exchange hashpartitioning(group#10, 200)
: +- LocalTableScan [group#10, x#11]
+- Exchange hashpartitioning(group#27, 200)
+- LocalTableScan [group#27, y#28]
ʻ*ʼ in the head means a codegenʼd plan
68Copyright©2016 NTT corp. All Rights Reserved.
• Benchmark Result
Add New Features in Spark
~13 times faster than vanilla APIs!!
69Copyright©2016 NTT corp. All Rights Reserved.
• Support more useful functions
• Spark only implements naive and basic functions in
terms of usability and maintainability
• e.g.,)
• flatten - flatten a nested schema into flat one
• from_csv/to_csv – interconversion of CSV strings and structured data with schemas
• ...
• See more in the Hivemall user guide
• https://hivemall.incubator.apache.org /userguide /spark /misc /misc.html
Add New Features in Spark
70Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.1) Compute a sigmoid function
Improve Some Functions in Spark
scala> val sigmoidFunc = (d: Double) => 1.0 / (1.0 + Math.exp(-d))
scala> val sparkUdf = functions.udf(sigmoidFunc)
scala> df.select(sparkUdf($“value”))
71Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.1) Compute a sigmoid function
Improve Some Functions in Spark
scala> val hiveUdf = HivemallOps.sigmoid
scala> df.select(hiveUdf($“value”))
72Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.1) Compute a sigmoid function
Improve Some Functions in Spark
73Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.2) Compute top-k for each key group
Improve Some Functions in Spark
scala> paste:
df.withColumn(
“rank”,
rank().over(Window.partitionBy($"key").orderBy($"score".desc)
) .where($"rank" <= topK)
74Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.2) Compute top-k for each key group
• Fixed the overhead issue for each_top_k
• See pr#353: “Implement EachTopK as a generator expression“ in Spark
Improve Some Functions in Spark
scala> df.each_top_k(topK, “key”, “score”, “value”)
75Copyright©2016 NTT corp. All Rights Reserved.
• Spark has overheads to call Hive UD*Fs
• Hivemall heavily depends on them
• ex.2) Compute top-k for each key group
Improve Some Functions in Spark
~4 times faster than rank!!
76Copyright©2016 NTT corp. All Rights Reserved.
• supports under development
• fast implementation of the gradient tree boosting
• widely used in Kaggle competitions
• This integration will make you...
• load built models and predict in parallel
• build multiple models in parallel
for cross-validation
Support 3rd Party Library in Spark
Conclusion	and	Takeaway
Hivemall	is	a	multi/cross-platform ML	library
providing	a	collection	of	machine	learning	algorithms	as	Hive	UDFs/UDTFs
2017/5/16	Apache	BigData	North	America	'17,	Miami
We	welcome	your	contributions	to	Apache	Hivemall	J
HiveQL SparkSQL/Dataframe API Pig	Latin
2017/5/16	Apache	BigData	North	America	'17,	Miami
Any	feature	request	or	questions?

Contenu connexe

Tendances

The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNDataWorks Summit
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 KeynotePyCon Singapore 2013 Keynote
PyCon Singapore 2013 KeynoteWes McKinney
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphDataWorks Summit
 
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San JoseHivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San JoseMakoto Yui
 
Improving data interoperability in Python and R
Improving data interoperability in Python and RImproving data interoperability in Python and R
Improving data interoperability in Python and RWes McKinney
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkDatabricks
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for QuantsWes McKinney
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
Solr Drupal 8 Integration - Drupal Camp Manila
Solr Drupal 8 Integration - Drupal Camp ManilaSolr Drupal 8 Integration - Drupal Camp Manila
Solr Drupal 8 Integration - Drupal Camp ManilaArradi Nur Rizal
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...DataWorks Summit/Hadoop Summit
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 

Tendances (20)

The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 KeynotePyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San JoseHivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
 
ISAX
ISAXISAX
ISAX
 
Improving data interoperability in Python and R
Improving data interoperability in Python and RImproving data interoperability in Python and R
Improving data interoperability in Python and R
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Solr Drupal 8 Integration - Drupal Camp Manila
Solr Drupal 8 Integration - Drupal Camp ManilaSolr Drupal 8 Integration - Drupal Camp Manila
Solr Drupal 8 Integration - Drupal Camp Manila
 
JBCN barcelona 2017 kappa architecture 2.0
JBCN barcelona 2017 kappa architecture 2.0JBCN barcelona 2017 kappa architecture 2.0
JBCN barcelona 2017 kappa architecture 2.0
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 

En vedette

Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at PaxcelPushpinder Singh
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanDimitri Ponomareff
 

En vedette (7)

Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at Paxcel
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and Kanban
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 

Similaire à Apache Hivemall @ Apache BigData '17, Miami

Idea behind Apache Hivemall
Idea behind Apache HivemallIdea behind Apache Hivemall
Idea behind Apache HivemallMakoto Yui
 
Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceMakoto Yui
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Introduction to Apache Hivemall v0.5.2 and v0.6
Introduction to Apache Hivemall v0.5.2 and v0.6Introduction to Apache Hivemall v0.5.2 and v0.6
Introduction to Apache Hivemall v0.5.2 and v0.6Makoto Yui
 
An Overview Of Apache Pig And Apache Hive
An Overview Of Apache Pig And Apache HiveAn Overview Of Apache Pig And Apache Hive
An Overview Of Apache Pig And Apache HiveJoe Andelija
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to HivemallMakoto Yui
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...Deep Learning Italia
 
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...OW2
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
MPI Raspberry pi 3 cluster
MPI Raspberry pi 3 clusterMPI Raspberry pi 3 cluster
MPI Raspberry pi 3 clusterArafat Hussain
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark newAnam Mahmood
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 

Similaire à Apache Hivemall @ Apache BigData '17, Miami (20)

Idea behind Apache Hivemall
Idea behind Apache HivemallIdea behind Apache Hivemall
Idea behind Apache Hivemall
 
Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experience
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Introduction to Apache Hivemall v0.5.2 and v0.6
Introduction to Apache Hivemall v0.5.2 and v0.6Introduction to Apache Hivemall v0.5.2 and v0.6
Introduction to Apache Hivemall v0.5.2 and v0.6
 
An Overview Of Apache Pig And Apache Hive
An Overview Of Apache Pig And Apache HiveAn Overview Of Apache Pig And Apache Hive
An Overview Of Apache Pig And Apache Hive
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
 
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
MPI Raspberry pi 3 cluster
MPI Raspberry pi 3 clusterMPI Raspberry pi 3 cluster
MPI Raspberry pi 3 cluster
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark new
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Big data analytics use case and software
Big data analytics use case and softwareBig data analytics use case and software
Big data analytics use case and software
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 

Plus de Makoto Yui

Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Makoto Yui
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Makoto Yui
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0Makoto Yui
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0Makoto Yui
 
Revisiting b+-trees
Revisiting b+-treesRevisiting b+-trees
Revisiting b+-treesMakoto Yui
 
Incubating Apache Hivemall
Incubating Apache HivemallIncubating Apache Hivemall
Incubating Apache HivemallMakoto Yui
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Makoto Yui
 
機械学習のデータ並列処理@第7回BDI研究会
機械学習のデータ並列処理@第7回BDI研究会機械学習のデータ並列処理@第7回BDI研究会
機械学習のデータ並列処理@第7回BDI研究会Makoto Yui
 
Dots20161029 myui
Dots20161029 myuiDots20161029 myui
Dots20161029 myuiMakoto Yui
 
Recommendation 101 using Hivemall
Recommendation 101 using HivemallRecommendation 101 using Hivemall
Recommendation 101 using HivemallMakoto Yui
 
Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Makoto Yui
 
Tdtechtalk20160425myui
Tdtechtalk20160425myuiTdtechtalk20160425myui
Tdtechtalk20160425myuiMakoto Yui
 
Tdtechtalk20160330myui
Tdtechtalk20160330myuiTdtechtalk20160330myui
Tdtechtalk20160330myuiMakoto Yui
 
Datascientistsymp1113
Datascientistsymp1113Datascientistsymp1113
Datascientistsymp1113Makoto Yui
 
2nd Hivemall meetup 20151020
2nd Hivemall meetup 201510202nd Hivemall meetup 20151020
2nd Hivemall meetup 20151020Makoto Yui
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Makoto Yui
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemallMakoto Yui
 
Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAMakoto Yui
 
Hivemall Talk at TD tech talk #3
Hivemall Talk at TD tech talk #3Hivemall Talk at TD tech talk #3
Hivemall Talk at TD tech talk #3Makoto Yui
 
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupMakoto Yui
 

Plus de Makoto Yui (20)

Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0
 
Revisiting b+-trees
Revisiting b+-treesRevisiting b+-trees
Revisiting b+-trees
 
Incubating Apache Hivemall
Incubating Apache HivemallIncubating Apache Hivemall
Incubating Apache Hivemall
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
機械学習のデータ並列処理@第7回BDI研究会
機械学習のデータ並列処理@第7回BDI研究会機械学習のデータ並列処理@第7回BDI研究会
機械学習のデータ並列処理@第7回BDI研究会
 
Dots20161029 myui
Dots20161029 myuiDots20161029 myui
Dots20161029 myui
 
Recommendation 101 using Hivemall
Recommendation 101 using HivemallRecommendation 101 using Hivemall
Recommendation 101 using Hivemall
 
Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016
 
Tdtechtalk20160425myui
Tdtechtalk20160425myuiTdtechtalk20160425myui
Tdtechtalk20160425myui
 
Tdtechtalk20160330myui
Tdtechtalk20160330myuiTdtechtalk20160330myui
Tdtechtalk20160330myui
 
Datascientistsymp1113
Datascientistsymp1113Datascientistsymp1113
Datascientistsymp1113
 
2nd Hivemall meetup 20151020
2nd Hivemall meetup 201510202nd Hivemall meetup 20151020
2nd Hivemall meetup 20151020
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CA
 
Hivemall Talk at TD tech talk #3
Hivemall Talk at TD tech talk #3Hivemall Talk at TD tech talk #3
Hivemall Talk at TD tech talk #3
 
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
 

Dernier

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Dernier (17)

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

Apache Hivemall @ Apache BigData '17, Miami