Inducing Predictive Clustering Trees for Datatype properties Values
1. Inducing Predictive Clustering Trees for
Datatype properties Values
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Semantic Machine Learning, 10th July 2016
G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18
2. Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions
G.Rizzo et al. (Univ. of Bari) 10th July 2016 2 / 18
3. The Context and Motivations
• Goal: approximating the (numerical) datatype property values
through regression models in the Web of Data
• Web of data: a large number of knowledge bases, datasets and
vocabularies exposed in a standard format (RDF, OWL)
• (numerical) property values can hardly be derived by using
reasoning services
• Open World Assumption
• a large number of missing information
• The informative gap can be filled by using regression models
G.Rizzo et al. (Univ. of Bari) 10th July 2016 3 / 18
4. The context and Motivations
• Solving a regression problem
• two or more property values may be related (e.g. crime rate and
population of a place)
• correlations should improve the predictiveness
• Predicting more numerical values at once (multi-target
regression) through Predictive Clustering approaches
• Predictive Clustering Trees (PCTs) as a generalization of decision
trees
• PCTs compliant to the representation languages for the Web of
Data (e.g. Description Logics)
• target values: the numeric role fillers for the properties
G.Rizzo et al. (Univ. of Bari) 10th July 2016 4 / 18
5. Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC and roles (relations), NR to model
domains
• Operators to build complex concept descriptions
• Concrete domains: string, boolean, numeric values
• Semantics defined through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
ALC operators
Top concept: ∆I
Bottom concept: ⊥ ∅
Concept: C CI
⊆ ∆I
Full Complement: ¬C ∆ CI
Intersection: C D CI
∩ DI
Disjunction: C D CI
∪ DI
Universal restriction ∀R.D {x ∈ ∆I
| ∀y ∈ ∆I
(x, y) ∈ RI
→ y ∈ DI
}
Existential restriction ∃R.D {x ∈ ∆I
| ∃y ∈ ∆I
(x, y) ∈ RI
∧ y ∈ DI
}
G.Rizzo et al. (Univ. of Bari) 10th July 2016 5 / 18
6. Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): class assertions, C(a) and role assertions,R(a, b) about
a set of individuals is denoted by Ind(A)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 6 / 18
7. The problem
Given:
• a knowledge base K = (T , A);
• the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains
Di , whose analytic forms are unknown;
• a training set Tr ⊆ Ind(A) for which the numeric fillers are
known,
Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t}
Build a regression model for {Ri }t
i=1, i.e. a function
h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over
Tr. A possible loss function may be based on the mean square error.
G.Rizzo et al. (Univ. of Bari) 10th July 2016 7 / 18
8. The proposed solution
• Predictive Clustering
• objects are clustered according to an homogeneity criterion
• for each cluster a predictive model is determined (e.g. vector
containing predictions)
(a) clustering (b) predictive mod-
els
(c) predictive clus-
tering
G.Rizzo et al. (Univ. of Bari) 10th July 2016 8 / 18
9. The model for multi-target regression
• Given a knowledge
base K, a PCT for
multi-target regression
is a binary tree where
• intermediate nodes:
DL concept
descriptions
• leaf nodes: vectors
containing the
predictions w.r.t.
the target properties
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 9 / 18
10. Learning PCTs
• Divide-and-conquer strategy
• For the current node:
• the refinement operator generates the candidate concepts
• The most promising concept E∗
is selected by maximizing the
homogeneity w.r.t. the target variables simultaneously.
• Best concept: the one minimizing the RMSE of the standardized
target properties values
• Stop conditions:
• maximum number of levels
• size of the training (sub)set
• Leaf: the i-th component contains the average value for the i-th
target property over the instances sorted to the node
G.Rizzo et al. (Univ. of Bari) 10th July 2016 10 / 18
11. Installing new DL concepts as inner nodes
• The candidate concept descriptions are generated by using a
refinement operator
• A quasi ordering relation over the space of the concept
descriptions
• The subsumption between concepts in Description Logics
• Downward refinement operator ρ(·) to obtain specializations E of
a concept description D (E D)
• Each concept can be obtained:
• by introducing a new concept name (or its complement) as a
conjuct
• by replacing a sub-description in the scope of an existential
restriction
• by replacing a sub-description in the scope of an universal
restriction
G.Rizzo et al. (Univ. of Bari) 10th July 2016 11 / 18
12. Prediction
• Given an unseen individual a, the properties values are
determined by traversing the tree structure
• Given a test concept D:
• if K |= D(a) the left branch is followed
• if K |= ¬D(a) the right branch is followed
• otherwise, a default model is returned
G.Rizzo et al. (Univ. of Bari) 10th July 2016 12 / 18
13. Experiments
Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 10-fold cross validation
• performance in terms of RRMSE
G.Rizzo et al. (Univ. of Bari) 10th July 2016 13 / 18
14. Table: Datasets extracted from DBPedia
Datasets Expr. Axioms. #classes # properties # ind.
Fragm.#1 ALCO 17222 990 255 12053
Fragm.#2 ALCO 20456 425 255 14400
Fragm.#3 ALCO 9070 370 106 4499
Table: Target properties ranges, number of individuals employed in the
learning problem
Datasets Properties Range |Tr|
Fragm. # 1
elevation [-654.14,19.00]
10000
populationTotal [0.0, 2255]
Fragm. #2
areaTotal [0, 16980.1]
10000
areaUrban [0.0, 6740.74]
areaMetro [0, 652874]
Fragm. #3
height [0,251.6]
2256
weight [-63.12,304.25]
G.Rizzo et al. (Univ. of Bari) 10th July 2016 14 / 18
15. Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5
G.Rizzo et al. (Univ. of Bari) 10th July 2016 15 / 18
16. Discussion
• PCTs more performant than TRT
• the different heuristic allows to choose more promising concepts
• standardization mitigated abnormal values increasing the error
• PCT more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more efficient than TRTs
G.Rizzo et al. (Univ. of Bari) 10th July 2016 16 / 18
17. Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties
• Further extensions
• New refinement operators
• Further heuristics
• linear models at leaf nodes
G.Rizzo et al. (Univ. of Bari) 10th July 2016 17 / 18