SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
1
A Study of Usability-aware Network Trace
Anonymization
Kato Mivule
Los Alamos National Laboratory
Los Alamos, New Mexico, USA
kmivue@gmail.com
Blake Anderson
Los Alamos National Laboratory
Los Alamos, New Mexico, USA
banderson@lanl.gov
Abstract— The publication and sharing of network trace
data is a critical to the advancement of collaborative
research among various entities, both in government,
private sector, and academia. However, due to the
sensitive and confidential nature of the data involved,
entities have to employ various anonymization techniques
to meet legal requirements in compliance with
confidentiality policies. Nevertheless, the very composition
of network trace data makes it a challenge when applying
anonymization techniques. On the other hand, basic
application of microdata anonymization techniques on
network traces is problematic and does not deliver the
necessary data usability. Therefore, as a contribution, we
point out some of the ongoing challenges in the network
trace anonymization. We then suggest usability-aware
anonymization heuristics by employing microdata privacy
techniques while giving consideration to usability of the
anonymized data. Our preliminary results show that with
trade-offs, it might be possible to generate anonymized
network traces with enhanced usability, on a case-by-case
basis using micro-data anonymization techniques.
Keywords—Network Trace Anonymization; Usability;
Differential Privacy; K-anonymity; Generalization
I. INTRODUCTION
While a number of network trace anonymization techniques
have been presented in literature, data utility remains
problematic due to the unique usability requirements by the
different consumers of the privatized network traces. Yet still,
a number of microdata privacy techniques from the statistical
and computation sciences, are difficult to implement when
anonymizing network traces due to the low usability of results.
Moreover, finding the right proportionality between
anonymization and data utility of network trace data is
intractable and requires trade-offs on a case-by-case basis,
after a careful consideration of privacy needs stipulated by
policy makers, and likewise the usability requirements of the
researchers, who in this case, are the consumers of the
anonymized data. Furthermore, a generalized approach fails to
deliver unique solutions, as each entity will have unique data
privacy requirements. In this study, we take a look at the
structure of the network trace data. We vertically partition the
network trace data into different attributes and apply micro-
data privatization techniques separately for each attribute. We
then suggest usability-aware anonymization heuristics for the
anonymization process. While a number of anonymization
attacks have been presented in literature, the main goal of this
study was generation of anonymized network traces with
better data usability. Therefore, the focus of the suggested
heuristics and preliminary results, is about the generation of
anonymized usability-aware network trace data, using privacy
techniques covered in the statistical disclosure control domain;
that include the following: Generalization, Noise addition and
Multiplicative noise perturbation, Differential Privacy, and
Data swapping [38]. A measure of usability by quantifying
descriptive and inference statistics of the anonymized data in
comparison with that of the original data is also presented.
Furthermore, we apply frequency distribution analysis and
unsupervised learning techniques in the measure of usability
for the unlabeled data. The rest of the paper is organized as
follows: In Section II, we present a review of related work,
and definition of important terms pertaining to this paper. In
Section III, we present methodologies and usability-aware
anonymization heuristics. In Section IV, the experiment and
results are given. Finally in Section V, the conclusion,
recommendations, and future works are presented.
II. RELATED WORK
One of the challenges of anonymizing network traces, is
how to keep the structure and flow of the data intact so as to
provide usability to the consumer of the anonymized data. In
such efforts, Maltz et al. (2004) demonstrated that network
trace data could be anonymized while preserving the structure
of the original data [1]. Additionally, Maltz et al. (2004)
observed and noted that some of the challenges in
anonymizing network traces included figuring out attributes in
the network trace that could leak sensitive information, and
how to anonymize the data such that the original
configurations are preserved [1]. Observations by Maltz et al.
are still relevant today, especially when considering the
intractability between privacy and usability. On the other
hand, Slagell, Wang, and Yurcik (2004) proposed Crypto-Pan,
a network trace anonymization tool that employs
cryptographic techniques in the privatization of network trace
data [2]. While anonymization using cryptographic means
might be effective in concealing sensitive data, usability of the
anonymized data is always a challenge. Bishop, Crawford,
Bhumiratana, Clark, and Levitt (2006), observed that one of
2
the problems in the anonymization of network traces, is that
when handling IP addresses, the set of available addresses is
finite, thus setting a limit to any anonymization prospects [3].
Each octet in the IP address would handle a range of 0 to 255.
For instance, it would not make much sense to have an
anonymized IP address with an octet value of 345. This
limitation makes the data vulnerable to de-anonymization
attacks. On the issue of de-anonymization attacks, Coull,
Wright, Monrose, Collins, and Reiter (2007) presented
inference techniques for de-anonymizing and detecting
network topologies in anonymized network trace data [4].
Coull et al. showed that topological data could be deduced as
an artifact of functional network packet traces, if the data on
activity of hosts can be utilized as an advantage to prevent a
successful obfuscation of the network traces [4]. Moreover,
Coull et al., pointed out that obfuscating network trace data is
not a trivial task as publishers of the data need to be aware of
the tension between balancing privacy and data utility needs
for anonymized network traces [4]. Additionally, Ribeiro,
Chen, Miklau, and Towsley (2008), showed that systematic
attacks on prefix-preserving anonymized network traces,
could be done by adversaries using modest amount of publicly
available information about a network and employing attack
techniques such as finger printing [5]. However, Ribeiro et al.
anticipated that their proposed attack methodologies would be
employed in evaluating worst-case vulnerabilities and finding
trade-offs between privacy and utility in prefix-preserving
privatization of network traces [5]. Therefore, while
researchers might have an interest in anonymized data sets
that maintain the structure and flow of the original data,
curators of that data have to contend with the fact that such
prefix-preserving anonymization is subject to de-
anonymization attacks.
A comprehensive reference model was presented by Gattani
and Daniels (2008), in which they outlined that entities needed
to formulate the problem of anonymizing network traces [6].
Gattani and Daniels (2008) noted that the anonymization
procedure always aims at the following three goals [6]: (i)
defending the confidentiality of users, (ii) obfuscating the
inner structure of a network, and (iii) generating anonymized
network traces with acceptable levels of usability [6].
However, Gattani and Daniels (2008) observed that attaining
those three anonymization goals is problematic, as removing
too much sensitive information from a network data trace only
reduces the usability of the anonymized network traces [6].
Additionally, Gattani and Daniels (2008), categorized attacks
on anonymized data categorized as, (i) active data injection
attacks, (ii) known mapping attacks, (iii) network topology
inference attacks, and (iv) cryptographic attacks [6]. On the
categorization of attacks, King, Lakkaraju, and Slagell (2009)
presented a taxonomy of attacks on anonymization techniques
with the aim of helping curators of the privatization process
negotiate trade-offs between data utility and anonymization
[7]. King et al., classified attacks on anonymization methods
as (i) fingerprinting, (ii) structure recognition, (iii) known
mapping, (iv) data injection, and (v) cryptographic attacks [7].
A combined categorization of attacks on anonymization
techniques, from Gattani and Daniels, and King et al., would
then be listed as follows [7] [6]: (i) Fingerprinting attacks: in
this this category of attacks, attributes of anonymized data are
compared with traits of known network structures to uncover a
relationship between the anonymized and non-anonymized
data. (ii) Data injection attacks: in this type of exploit, an
attacker injects pseudo-traffic data in a network trace before
anonymization process and uses the pseudo-traffic traces to
de-anonymize the network traces and network structure. (iii)
Structure recognition attacks: in this type of exploit, an
attacker seeks to determine the structure between objects in
the anonymized data to discover multiple relations between
anonymized and non-anonymized data. (iv) Network topology
inference: similar to known mapping attacks, this category of
exploits seeks to retrieve the network topology map by de-
anonymizing the nodes that make up the vertices of the
network, the edges between the nodes that represent the
connectivity and the routers. (v) Known mapping attacks: in
this category of exploit, the attacker relies on external data
(auxiliary data) to find a mapping between the anonymized
network trace data and the original network trace data in order
to retrieve the original IP addresses. (vi) Cryptographic
attacks: in this category of attacks, exploits are carried out to
break cryptographic algorithms used to encrypt the network
traces.
A comparative analysis was done by Coull, Monrose, Reiter,
and Bailey (2009) in which they pointed out the similarities
and differences between network data anonymization and
microdata privatization techniques, and how microdata
obfuscation methods could be applied to anonymize network
traces [8]. Coull, et al. observed that uncertainties did exist
about the effectiveness of network data anonymization, from
both methodological and policy view, with the research
community in need for more study to understand the
implications of publishing anonymized network data and the
utility of such data to researchers [8]. Furthermore, Coull, et
al. suggested that the extensive work that exists in the
statistical disclosure control discipline could be employed by
the network research community towards the privatization of
network flow data [8]. On network trace packet
anonymization, Foukarakis, Antoniades, and Polychronakis
(2009), proposed the anonymization of network traces at the
packet level – in the payload of a packet, due to inadequacies
found in various network trace anonymization techniques [9].
Foukarakis et al., suggested identifying revealing information
contained in the shell-code of code injection attacks, and
anonymizing such packets to grant confidentiality in published
network attack traces [9]. However, on the subject of IP-flow
intrusion detection methods, Sperotto et al. (2010) presented
an overview of IP-flow intrusion detection approach and
highlighted the classification of attacks, and defense methods
and how flow-based method can be used to discover scans,
worms, botnets and denial of service (DoS) attacks [10].
Furthermore Sperotto et al. highlighted two types of sampling;
packet sampling whereby a packet is deterministically chosen
based on a time interval for analysis; and flow sampling in
which a sample flow is chosen for analysis [10]. At the same
3
time, Burkhart et al. (2010), in their review of anonymization
techniques, showed that current anonymization techniques are
vulnerable to a series of injection attacks, by inserting attacker
packets into the network flow prior to anonymization, then
later retrieving the packets, thus revealing vulnerabilities and
patterns in the anonymized data [11]. As a mitigation to
injection attacks, Burhart et al. suggested that anonymization
of network flow data should be done as part of a
comprehensive approach including both legal and technical
perspectives on data confidentiality [11].
Meanwhile, McSherry and Mahajan (2011) showed that
differential privacy could be employed to anonymize network
trace data. Yet despite privacy guarantees provided by
differential privacy, the usability of the privatized data
remains a challenge due to excessive noise from the
anonymization [12]. However, McSherry, Frank, and Mahajan
(2011), in their study of applying differential privacy on
network trace data, acknowledged the challenges of balancing
usability and privacy, despite the confidentiality assurances
accorded by differential privacy [13]. On real time interactive
anonymization, Paul, Valgenti, and Kim (2011) proposed the
Real-time Netshuffle anonymization technique whereby
distortion is done to a complete graph to prevent inference
attacks in network traffic [14]. Netshuffle works by employing
k-anonymity methodology on network traces, by ensuring that
all trace records appear at least k>1, with k being the
anonymized record, and then shuffling gets applied on the k-
anonymized records, making it difficult for an attacker to
decipher due to the distortion [14]. A network trace
obfuscation methodology, (k, j)-obfuscation, was proposed by
Riboni, Villani, Vitali, Bettini, and Mancini (2012), in which a
network flow is considered obfuscated if it cannot be linked
with greater assurance, to its source and destination IP
addresses [15]. Riboni, et al. observed from their
implementation of (k, j)-obfuscation, that the large set of
network flows maintained the utility of the original network
trace [15]. However, the context of data utility remains
challenging as each consumer of privatized data will have
unique usability requirements, different levels of needed
assurance, and therefore, utility becomes constrained to a
case-by-case basis, depending on an entity's privacy and
usability needs. On the issue of preserving IP consistency in
anonymized data, Qardaji and Li (2012) observed that full
prefix-preserving IP anonymization suffers from a number of
attacks yet from a usability perspective, some level of
consistency is required in anonymized IP addresses [16]. To
mitigate this problem, Qardaji and li (2012), proposed
maintaining pseudonym consistency by dividing flow data
into buckets based on temporal closeness and separately
privatize flows in each bucket, thus maintaining consistency
only in each bucket but not globally across all buckets [16].
Mendonca, Seetharaman, and Obraczka (2012) proposed
AnonyFlow, an interactive anonymization technique that
provides end point privacy by preventing the tracking of
source behavior and location in network data [17]. However,
Mendonca et al. acknowledged that AnonyFlow does not
address issues of complete anonymity, data security,
steganography, and network trace anonymization in non-
interactive settings [17].
On generating synthetic network traces, Jeon, Yun, and Kim
(2013), proposed an anomaly-based intrusion detection system
(A-IDS) to generate pseudo-network traffic for the
obfuscation of real sensitive network traffic in supervisory
control and data acquisition (SCADA) systems [18]. An
overview of network data anonymization was presented by
Nassar, al Bouna, Malluhi (2013), in which the need to
address the problem of finding appropriate anonymization
algorithms that grant privacy but with an optimal risk-utility
trade-off, was highlighted [19]. On using entropy and
similarity distance measures, Xiaoyun, Yujie, Xiaosheng,
Xiaohong, and Yan (2013) employed similarity distance and
entropy techniques in the quantification of anonymized
network trace data [20]. Xiaoyun et al. proposed two types of
similarity measures: (i) external similarity, in which the
distance measurements are done to compute the probability
that an adversary will obtain a one-to-one mapping relation
between the anonymized and the original data, based on
auxiliary knowledge; (ii) Internal similarity, in which distance
measurements are done on the privatized and the original data
to indicate how distinguishable or indistinguishable the data
sets are [20]. On the extracting, classification, and
anonymization of packet traces, Lin, Lin, Wang, Chen, and
Lai (2014), observed that capturing and sharing real network
traffic faced two challenges, first various protocols are
associated with the packet traces and secondly, such packet
traces tend not to be well classified before deep packet
anonymization [21]. Therefore, Lin et al. proposed PCAPLib
methodology to extract, classify, and the deep packet
anonymization of packet traces [21]. In their work on Session
Initiation Protocol (SIP) used in multimedia communication
sessions, Stanek, Kencl, and Kuthan (2014), pointed out that
current network trace anonymization techniques are
insufficient for SIP traces due to the data format of the SIP
trace, that includes, the IP address, the SIP URI, and the e-
mail address [22]. To mitigate this problem, Stanek et al,
proposed SiAnTo, an anonymization methodology that
replaces SIP information with non-descriptive but matching
labels [22]. Of recent, Riboni, Villani, Vitali, Bettini, and
Mancini (2014), cautioned that current network trace
anonymization techniques are vulnerable to various attacks
while at the same time it is problematic to apply microdata
privatization methods in obfuscating network traces [23].
Moreover, Riboni et al. noted that current obfuscation
methods depend on assumptions about an adversary
intentions, which are challenging to model, and do not
guarantee privacy against background knowledge attacks [23].
In Table I, is a summary of some of network trace
anonymization challenges outlined in literature for the past ten
years.
A. Network trace anonymization techniques
In the following section, a review of some of the common
network trace anonymization techniques is presented [24] [25]
[26] [27] [28] [16]: (i) Black marker technique: in this
4
method, sensitive values are erased or substituted with fixed
values.
TABLE I. SUMMARY OF NETWORK TRACE ANONYMIZATION
CHALLENGES
Author (s) Network Trace Anonymization Challenges
Maltz et al., (2004) Challenge of identifying attributes to anonymize while
conserving usability
Slagell et al., (2004) Crypto-Pan – cryptography to anonymize IP addresses
– usability a challenge.
Bishop et al.,(2006) Anonymization of IP addresses problematic – set of IP
addresses is finite.
Coull et al.,(2007) Obfuscation not trivial task due to the tensions
between privacy and usability.
Ribeiro et al., (2008) Prefix-preserving anonymized data subject to
Fingerprinting attacks.
King et al., (2009) Taxonomy of attacks on anonymization technique –
anonymization challenges.
Coull et al., (2009) Comparison between network and micro data
anonymization – significant differences.
Foukarakis et al.,
(2009)
Network trace anonymization at the packet level – a
challenge.
Burkhart et al, (2010) Injection attacks on anonymized network trace data.
McSherry and
Mahajan (2011)
Differential privacy anonymization of network trace
data.
Paul, Valgenti, and
Kim (2011)
Real-time anonymization with k-anonymity.
Riboni et al., (2012) (k, j)-obfuscation – network flow is obfuscated if it
cannot be linked to original data with greater
assurance
Qardaji and Li (2012) Global Prefix Consistency is subject to attacks.
Mendonca et al.,
(2012)
Interactive network trace anonymization.
Jeon, Yun, and Kim
(2013)
Synthetic (anonymized) network trace data generation.
Nassar, al et al.,
(2013)
Balance between utility and privacy needed - still a
problem.
Farah and Trajkovic
(2013)
Network trace anonymization techniques - an
overview.
Stanek et al., (2014) Proposed Session Initiation Protocol (SIP)
anonymization and challenges.
Riboni et al., (2014) Caution with current network anonymization
techniques – vulnerable to attacks
(ii) Enumeration technique: in this scheme, sensitive values in
a sequence are replaced with an ordered sequence of synthetic
values. (iii) Hash technique: unique values are substituted
with a fixed size bit string in the hash technique. (iv)
Partitioning technique: with the partitioning method,
revealing values are partitioned into a subset of values and
each of the values in the subset is replaced with a generalized
value. For example, an IP address 141.121.10.12, could be
partitioned into four octets and the last two octets replaced
with zero values, 141.121.0.0. (v) Precision degradation
technique: highly specific values of a time-stamp attribute are
removed when employing the precision degradation method.
(vi) Permutation technique: A random permutation is done to
link non-anonymized IP and MAC addresses to a set of
available addresses. (vii) Prefix-preserving anonymization
technique: in this technique, values of an IP address are
replaced with synthetic values in such a way that the original
structure of the IP address is kept – the prefix values of an IP
address structure is preserved. Prefix-preservation could be
applied fully or partially on the IP address. The fully prefix-
preserving anonymization will map the full structure of the
original IP address in the anonymized data, while the partially
prefix-preserving anonymization will preserve a select
structure of the original IP address, for example the first two
octets. (viii) Random time shift technique: this methodology
works by applying a random value as an offset to each value
in the field. (ix) Truncation technique: with this technique,
part of the IP or MAC address is suppressed or deleted and the
remaining IP address remains intact. (x) Time unit
annihilation: In this partitioning anonymization methodology,
part of the time-stamp is deleted and replaced with zeros. In
Table 1, a summary of ongoing challenges from literature, on
anonymizing network traces is given. Although a number of
network trace anonymization solutions have been proposed in
literature, usability of the anonymized data remains
problematic. While a number of challenges exist, this study
labored to focus on the challenge of usability-aware
anonymization of network traces.
B. Statistical disclosure control techniques
The following are some of the main microdata privatization
methods used: Suppression: in this technique, revealing and
sensitive data values are deleted from a data set at the cell
level [29]. Generalization: to achieve confidentiality for
revealing values in an attribute, a single value is allocated to a
group of sensitive values in the attribute [30]. K-anonymity: in
this method, data privacy is enforced by requiring that all
values in the quasi-attributes be repeated k times, such that k
>1, thus providing confidentiality and making it harder to
uniquely distinguish individuals values. K-anonymity employs
both generalization and suppression methods to achieve k >1
[31]. Data swapping: Data swapping is a data privacy
technique that involves exchanging sensitive cell values with
other cell values in the same attribute while keeping intact the
frequencies and statistical traits of the original data, and as
such, making it difficult for an attacker to map the privatized
values to the original record [32]. Noise addition: noise
addition is a data privacy method that adds random values
(noise) to revealing and sensitive numerical values, in the
original data, to ensure confidentiality. The random values are
usually chosen from between the mean and standard deviation
of the original values [33]:
𝑋! + 𝜀! = 𝑍! (1)
Multiplicative noise: similar to noise addition, random values
generated between the mean and variance of the original data
5
values, are then multiplied to the original data generating a
privatized data set [34].
𝑋! ∗ 𝜀! = 𝑍! (2)
Where X = original data, Z = privatized data, and ε = the
random values. Differential Privacy: Similar to noise addition,
differential privacy imposes privacy by adding Laplace noise
to query results from the database such that it cannot be
distinguished if a particular value has been adjusted in that
database or not; making it more difficult for an attacker to
decode items in the database [35]. ε-differential privacy is
satisfied if the results to a query run on database D1 and D2
should probabilistically be similar, and meet the following
condition [35]:
𝑃 𝑞! 𝐷! ∈ 𝑅 𝑃 𝑞! 𝐷! ∈ 𝑅 ⩽ 𝑒!
(3)
Where D1 and D2 are the two databases; P is the probability of
the perturbed query results D1 and D2; qn() is the privacy
granting procedure (perturbation); qn(D1) is the privacy
granting procedure on query results from database D1; qn(D2)
is the privacy granting procedure on query results from
database D2; R is the perturbed query results from the
databases D1 and D2 respectively; eε
is the exponential e
epsilon value. Differential privacy can be implemented as
follows [36]:
(i) Run query on database
𝑤ℎ𝑒𝑟𝑒𝑓 𝑥 = 𝑞𝑢𝑒𝑟𝑦𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
(ii) Calculate the most influential observation
𝛥𝑓 = 𝑀𝑎𝑥 𝑓 𝐷! − 𝑓 𝐷! (4)
(iii) Calculate the Laplace noise distribution
𝑏 = 𝛥 𝑓 𝜀 (5)
(iv) Add Laplace noise distribution to the query results
𝐷𝑃 = 𝑓 𝑥 + 𝐿𝑎𝑝𝑎𝑙𝑐𝑒 0, 𝑏 (6)
(v) Publish perturbed query results in interactive (query
responses) or non-interactive (macro, micro data) mode.
C. Metrics used to quantify usability in this study
The Shannon entropy: entropy is used essentially to measure
the amount of randomness and uncertainty in a data set; if all
values in a set of information fall into one category, then
entropy in such cases is at zero. Probability is used to quantify
randomness of elements in an information set; normalized
entropy values range from 0 to 1, getting to the upper bound
level when all probabilities are equal [37] [36]. Entropy is
formally described using the following formula [37]:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = 𝐻 𝑝!, 𝑝!, . . . , 𝑝! = 𝑝!
!
!!! ⋅ 𝑙𝑜𝑔
!
!!
(7)
where pi = probability; H(p1, p2,...,pn) is entropy for each pi.
Correlation Metric (between Original data X and Privatized
data Z): Correlation rxz computes the inclination and tendency
of an additive linear relation between two data points; the
correlation is dimensionless, independent of the environs in
which the data points x and y are measured and is expressed as
follows [38]:
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑟!" = 𝐶𝑜𝑣 𝑥𝑧 𝜎! 𝜎! (8)
Where Cov xz is the covariance product of X and Z, sigma (σ)
represents the standard deviation product of X and Z. If rxz = -
1, then a negative linear relation exists between X and Z; if rxz
= 0, no linear relation exits between X and Z; when rxz = 1, a
strong linear relation between X and Z exists. Descriptive
Statistics Metric: Descriptive statistics (DS) such as the mean,
standard deviation, variance, etc., are used in quantifying how
much distortion there is between the anonymized and original
data. The larger the difference, the more privacy but also an
indication of less usability; the closer the difference, the more
usability but perhaps less privacy. The format used in the
quantification is always in the form [36]:
𝑈𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝐷𝑆(𝑍) − 𝐷𝑆(𝑋) (9)
Where Z is the anonymized data, X is the original data, and
DS, the descriptive statistics. Distance Measures Metric
(Euclidean Distance): For distance measures, we employed
clustering with k-means to evaluate how well the clustering of
the original data compares with that of the anonymized data.
In this case, the Euclidean Distance is used for k-means
clustering and is expressed as follows [39]:
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥, 𝑦 = 𝑥! − 𝑦!
!!
!!! (10)
The Davis Bouldin index: was also used in the evaluation of
how well the clustering performed. The Davis Bouldin Index
(DBI) is expressed as follows [21]:
𝐷𝐵𝐼 =
!
!
𝐷!
!
!!! (11)
Where 𝐷! ≡ 𝑚𝑎𝑥
!:!!!
𝑅!,! (12)
And 𝑅!,! ≡
!!!!!
!!,!
(13)
And Ri,j is a quantification of how good the clustering is. Si
and Sj is the distance within each cluster. Mi,j is the distance
between clusters. Classification Error Metric: With the
classification error test, both the original and anonymized data
are passed through machine learning and the classification
error (or accuracy) is returned. The classification error (CE) of
the anonymized data is subtracted from that of the original.
6
The larger the difference, the more privacy (due to distortion);
this might be an indication of low usability. However, a
smaller difference might indicate better usability but then low
privacy, as anonymized results might be closer to the original
data in similarity. Depending on the machine-learning
algorithm used, the classification error metric will be in this
form [36]:
𝑈𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝐺𝑎𝑢𝑔𝑒 = 𝐶𝐸 𝑍 − 𝐶𝐸 𝑋 (14)
Where Z is the anonymized data, X the original data, and
CE is the classification error.
III. METHODOLOGY
In this section, we describe the implemented methodology;
in this case, heuristics used in the anonymization of network
trace data, within the context of usability while at the same
time granting privacy requirements. The goal of the heuristics
is to provide an anonymized data that could be used by
researchers with close statistical traits to the original data. The
trade-off in this case, is that we tilt towards more utility while
making it harder for an attacker to decrypt the original data,
assuming that the attacker has no prior knowledge. Because of
the unique data structure of network traces, a single
generalized approach is not applicable in anonymizing all the
network trace attributes. In our approach, we apply a hybrid of
anonymization heuristics for each group of related attributes.
Combinations of microdata anonymization techniques were
used in this study, as illustrated in Figure 1. The following
attributes were anonymized in the network trace data: (i) Start
and End Time (Time-stamp), (ii) Source IP and Destination
IP, (iii) Protocol, (iv) Source Port and Destination Port, (v)
Source Packet Size and Destination Packet Size, (vi) Source
Bytes and Destination Bytes, (vii) TOS Flags. However, due
to space constraints, we only present results for the Timestamp
and IP Address attributes.
Figure 1: An illustration of the proposed anonymization heuristics for the network trace data.
A. Enumeration with multiplicative pertubation
To preserve the flow structure of the timestamp, we
employed enumeration with multiplicative perturbation, a
heuristic that combines multiplicative noise addition technique
from the microdata privatization techniques and enumeration
from network trace anonymization. The Enumeration with
Multiplicative Perturbation Heuristic is implemented as
follows: Step (I): A small epsilon constant value is chosen
between 0 and 1. Data curators could conceal this random
value, arbitrarily chosen between 0 and 1, as an additional
layer of confidentiality. Step (ii): The small epsilon constant
value is then multiplied to the original data (timestamp, both
Start and End Time attributes) generating an enumerated set.
Step (iii): The generated enumerated data is then added to the
original data, producing an anonymized data set. Step (iv): A
test for usability is done, using descriptive statistical analysis,
entropy, correlation, and unsupervised learning using
clustering (k-means). Step (v): If the desired threshold is met,
the anonymized data is published. The goal with this heuristic
is to keep the time flow structure intact and similar to the
original data while at the same time anonymizing the time
series values. In this case, the anonymized time series data
should generate similar usability results to the original.
B. Generalization and differential privacy
The IP address is one of the most challenging attributes to
anonymize since each octet of the IP address is limited to a
finite set of numbers, from 0 to 255. This makes the IP address
attribute vulnerable to attackers in attempts to de-anonymize
the privatized network trace [3]. With such restrictions, the
curator of the data is left with the choice of completely
anonymizing the IP address by employing full perturbation
techniques, which in turn keeps the flow structure and prefix
of the IP address distorted, and thus poor data usability. One
solution to this problem would be to employ heuristics that
would grant anonymization and at the same time keep the
prefix of the IP address intact. However, full IP address
prefix-preserving anonymization has been shown to be prone
to de-anonymization attacks, yet presenting another challenge
[5]. Therefore, to deal with this problem, we suggest a partial
prefix-preserving heuristic in which differential privacy and
generalization are used and implemented as follows: Octet 1,
anonymization: The IP address is split into four octets.
Generalization is applied to the first octet to partially preserve
7
the prefix of the anonymized IP address. The goal is to give
the users of the anonymized data some level of usability by
being able get a synthetic flow of the IP address structure in
the network trace. Step (i): A small epsilon constant value is
chosen and used for application (added or multiplied to data)
with noise addition or multiplicative noise on the first octet.
The goal is to preserve the flow structure in the first octet.
Step (ii): Frequency count analysis to check that none of the
first octet values from the original data re-appear in the
anonymized data is done at this stage. Step (iii): If first octet
values reappear in the anonymized data, generalization by
replacing the reappearing values with the most frequent values
in the anonymized first octet is done. Step (iv): Finally,
generalization and k-anonymity are applied to ensure no
unique values appear, and that all values in the first octet
appear k >1. Step (v): A test for usability by comparing the
original and anonymized first octet values, is done. Octet 2, 3,
and 4 anonymization: To make it difficult to de-anonymize
the full IP address, randomization using differential privacy is
applied to the remaining three octets. However, since each
octet is limited to a set of 0 to 255 finite numbers, the
differential privacy perturbation process will generate some
values that would exceed 255; for instance, it would not make
meaning to have an octet value of 350. To mitigate this
situation, a control statement is introduced at the end of the
differential privacy process, to exclude all values greater than
an IP class address and octet range. In this case, any values
greater than 255 are excluded from the end results of the
perturbation process. Differential privacy is applied to each of
the three octets vertically and separately. Step (i): A vertical
split of octet 2, 3, and 4 into separate attributes, is done. Step
(ii): Anonymization using differential privacy on each
attribute (octet) separately is done at this stage. Step (iii): Test
to ensure that anonymized values in each octet are in range,
from 0 to 255. Step (iv): If the anonymized values in an octet
exceed the 0 to 255 range then return a generalized value
using the most frequent value in that 0 to 255 range. Step (v):
Test for usability. Step (vi): Combine all octets to a full
anonymized IP address.
IV. RESULTS
Preliminary results are presented in this section. However,
due to space limitation in this publication, only results for the
timestamp and IP address attributes are presented. Real 2014
network trace (NetFlow) data provided by Los Alamos
National Laboratory were used in this experiment. A total of
500000 network flow records were anonymized in this study.
Microdata obfuscation techniques were applied for the
anonymization process. Each attribute of the NetFlow trace
was anonymized separately.
A. Timestamp anonymization and usability results
Descriptive statistical analysis was done on both the original
and anonymized data sets, as shown in Table II. The aim was
to study the statistical traits of both the original and
anonymized data sets and show any similarities. In this case,
the statistical traits of the anonymized data show an
augmentation of the original data – a generation of a synthetic
data set in this case. For instance the original mean of the start
time and end time was 1123355142 and 1123355214
respectively, and while that of the anonymized data set was at
1944808589 and 1944808714. The difference between the
anonymized and original data was at 821453447 and
821453500 respectively. A larger difference might indicate
more privacy and less usability, while a smaller difference
might indicate better privacy but less usability. The results
presented in Table II indicate a mid-way with both privacy
and usability needs met after trade-offs (the difference).
TABLE II. STATISTICAL TRAITS OF ORIGINAL AND ANONYMIZED
TIMESTAMP DATA
However, to meet the requirements of different users for the
anonymized data, a fine-tuning of the parameters in the
anonymization heuristics would need to be done. Additionally,
the normalized Shannon's entropy results, as shown in Table
II, were similar for both original and anonymized data at
approximately 0.77 and 0.76 for the start and end times
respectively. The entropy results indicate that the distortions
and uncertainty in both data sets might be similar. While the
entropy results might be good for usability, it could likewise
be argued that privacy levels might be inadequate since the
two data sets are similar in that regard. However, the
correlation values between the anonymized and original data
was at 0.532 and 0.534 for the start and end time attributes
respectively. The results could indicate that while correlations
exist between the two data sets, the significance is not that
high since the values do not approach 1.
Figure 2: K-means clustering results for the original start and end time data.
8
The results might indicate that privacy is maintained in the
anonymized data, with an acceptable level of usability. In
Figure 2, results from clustering the original network trace
data (timestamp attribute), is presented. The x-axis in Figure 2
represents the start-time, while the y-axis represents the end-
time of the activity in the network trace. The value of k for the
k-means was set to 5 in this experiment. From an anecdotal
point of view, we can see that the clustering results in Figure 2
have their own skeletal structure. However, this is not the case
in Figure 3. In Figure 3, data privacy using noise addition was
applied idealistically, without much consideration given to the
issue of usability.
Figure 3: Idealistic Privacy application and clustering results
An anecdotal view of results in Figure 3 might point to better
privacy, since the skeletal cluster structure of the original data
was dismantled and replaced with a new skeletal cluster
structure.
Figure 4: K-means clustering for the anonymized start and end-time data.
However, usability remains a challenge, as the anonymized
clustering results are far from being close to the original
clustering. In the case of this study, the aim was to obtain
clustering results with better usability. Therefore, a re-tuning
of the parameters in the data privacy procedure is done to
achieve better usability. On the other hand, the goal of using
cluster analysis with k-means was to analyze how the
unlabeled original network trace data would perform in
comparison to the anonymized data. Furthermore, the Davis-
Bouldin criterion shows a value of 0.522, as depicted in Table
II, indicating how well the clustering algorithm (k-means)
performed with the original time-stamp (start and end times)
data. In Figure 4, clustering results (with k=5 for the k-means)
for the anonymized data are presented, with the x-axis
showing the start time and the y-axis presenting the end time.
Figure 5: K-means Cluster performance showing the average distance within
centroid and items in each cluster
The Davis-Bouldin criterion for the clustering performance on
the anonymized data was 0.393 as shown in Table II, a value
lower than that of the original data, and an indication of better
clustering. However, while an anecdotal view of the plots
shows that the cluster results look similar, the number of items
in each cluster in the anonymized data differ from that of the
original, as shown in Figure 5. For instance, in Figure 5, the
number of items in cluster 0 for the original data is at 310678,
while that of the anonymized data is at 291002. The trade-off
would be the difference of 19676 items. The challenge still
remains as to effectively balance anonymity and usability
requirements, with trade-offs. In this case, if the usability
threshold is not met, then the curator can fine-tune the
anonymization parameters. The average-within-centroid
distance returned a lower value for the anonymized data at
77865, and for the original data at 157093, with the lower
value indicating better clustering, as shown in Figure 5.
B. Source and destination IP address anonymity results
The IP address remains a challenging attribute to anonymize
due to the finite nature of the IP addresses. Each octet is
limited to a range of 0 to 255 and obfuscation becomes
constricted to that range. As we hinted earlier, it would not
make any meaning to have octet values ranging between 270
and 450, for instance. In this section we present preliminary
results on the anonymization and usability of the source and
destination IP attribute values using the heuristics in section 3.
Correlation: The correlation between the original and
anonymized data, as shown in Table III, for the first octet of
9
the source and destination IP show values of 0.9 and 1
respectively. These strong correlation values are indicative of
a strong linear relationship between the original and
anonymized octet 1 data. The first octet of the IP address was
anonymized using noise addition and generalization to keep
the flow structure similar to the original. Since a partial prefix
preserving anonymization was used, it is noteworthy that there
are strong correlation values between the original and
anonymized data for the first octet IP values.
TABLE III. STATISTICAL TRAITS OF ORIGINAL AND ANONYMIZED SOURCE AND DESTINATION IP ADDRESSES
Our view is that a researcher could still derive general
network information from the flow structure presented by the
first octet in the IP address without compromising the
specifics of the other inner 3 octets. Yet the correlation
between the anonymized data and original data for the 2, 3,
and 4 octets show values of 0 for the destination IP addresses
and minimal values of -0.081, 0.093, and 0.213, for source IP
addresses, indicating that there is very low relationship
between the anonymized and original data for octets 2, 3, and
4. However, the very low correlation values might be a good
indicator for stronger privacy, since we employed differential
privacy in the anonymization of octets 2, 3 and 4. Therefore
the correlation between the anonymized and original data
would be nonexistent or at least very minimal due to the
differential privacy randomization. Hence the partial prefix-
preserving heuristic works in this case, the user of the
anonymized data is only able to derive information from the
first octet while all other internal IP address information is
kept confidential.
Entropy: The Shannon Entropy test was done on both the
original and anonymized data IP addresses to study the
uncertainty and randomness in the data sets. The normalized
Shannon's entropy values range between 0 and 1, with 0
indicating certainty and 1 indicating uncertainty. As shown in
both Table III and Figure 6, the entropy values for octet 1 in
both the original and anonymized data, is approximately at
0.1, indicative of certainty of values and thus maintenance of
flow in the first octet. However, for octets 3 and 4, there is
much less certainty in the original data and in octets 2, 3, and
4 for the anonymized data, though much lower than the
original. Nevertheless, octet 2 in the original data provides
more certainty than octet 2 in the anonymized data. While the
entropy levels in octet 3 and 4 in the original data seem higher
than that of the anonymized data, overall, octets 2, 3, and 4 in
the anonymized data, provide more distributed uncertainty,
better randomness, and thus better anonymity. Yet still, we
constrained the random values in octet 2, 3, and 4 generated
during the differential privacy procedure not to exceed 255.
An octet value of 355 or 400 would affect the usability of the
anonymized IP address data. However, it could be argued that
the certainty levels are maintained in octet 1 for both original
and anonymized data, with distortion on octet 2, 3, and 4 in
the anonymized data, indicating that the flow structure is kept,
and thus partial prefix-preserving anonymity might be
achieved.
Figure 6: Normalized Shannon's Entropy values for the original and
anonymized IP addresses.
Frequency Distribution histogram analysis: Furthermore, we
did a frequency analysis to compare the distribution of values
in each octet in the IP address, for both the original and
anonymized IP addresses. For the original data the number of
items in octet 1 between 40 and 45, that is, source IP addresses
that start with octet values 40 to 45, came to approximately
400,000 out of 500,000 records, as shown in Figure 7. Similar
results were actualized for the destination IP address, for octet
1 with about 300,000 items with values 40 to 45, as illustrated
in Figure 8. With the exception of octet 2, the values in octet 3
and 4 are distributed across the range 0 to 85 in the original IP
address data; this correlates with results shown in Figure 6,
with higher entropy values for octet 3 and 4 in the original
10
data, indicating more uncertainty. The x-axis in each graph
represents the IP octet values, and the y-axis, shows the
frequency of each of those octet values. However, a look at
the anonymized IP address data shows that octet 1 had about
390,000 IP address octet 1 values beginning with 200, as
shown in Figure 9 and 10, for both source and destination IP
address data respectively. The results show the effect of
generalization used in the obfuscation of the original data for
octet 1. The values in octet 2, 3 and 4 were distributed across
the 0 to 255 range, with the highest concentration around octet
value 190 due to the constraints placed on the differential
privacy results, to prevent a return of value greater than 255. It
would not make much meaning, as mentioned earlier, to have
differential privacy results that exceed 255. For octet 2, 3, and
4, the Laplace distribution is kept due to the noise distribution
used in the differential privacy process.
Figure 7: Frequency distribution for the original source IP octet values.
Figure 8: Frequency distribution for the original destination IP octet values
Our recommendation as a result of this study is that a privacy
engineering approach be highly considered by curators during
the anonymization process.
V. CONCLUSION
Anonymizing network traces while maintaining an acceptable
level of usability remains a challenge, especially when
employing privatization techniques used for microdata
obfuscation. Moreover, obfuscating network traces remains
problematic due to the IP addresses and octet values being
finite. Furthermore, generalized anonymization approaches
fail to deliver specific solutions, as each entity will have
unique data privacy and usability requirements, and the data in
most cases have varying characteristics to be considered
during the obfuscation process. In this study, we have
provided a review of literature, pointing out some of the
ongoing challenges in the network trace anonymization over
the last 10-year period. We have suggested usability-aware
anonymization heuristics by employing microdata privacy
techniques, while taking into consideration the usability of the
anonymized network trace data. Our preliminary results show
that with trade-offs, it might be possible to generate
anonymized network traces on a case-by-case basis, using
micro-data anonymization techniques, such as differential
privacy, k-anonymity, generalization, multiplicative noise
addition.
Figure 9: Frequency distribution for anonymized source IP octet values
In the initial stage of the privacy engineering process, the
curators could gather privacy and usability requirements from
the stakeholders involved, this would include both the policy
makers and anticipated users (researchers) of the anonymized
network trace data. The curators could then model the most
applicable approach given trade-offs, on a case-by-case basis.
The generated anonymization model could then be
implemented across the enterprise for uniformity and
prevention of information leakage attacks. On the limitations
of this study, focus was placed on usability-aware
11
anonymization of network trace data and not on the types of
attacks on anonymized network traces. While some
consideration and mention of anonymization attacks was
given in this study, focusing on de-anonymization attacks was
beyond the scope of this study, and a subject left for future
work.
Figure 10: Frequency distribution for anonymized destination IP octet values
ACKNOWLEDGMENT
We would like to express our appreciation to the Los
Alamos National Laboratory, and more specifically, the
Advanced Computing Solutions Group, for making this work
possible.
REFERENCES
[1] D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjálmtýsson, A. Greenberg,
and J. Rexford, “Structure preserving anonymization of router
configuration data”, In Proceedings of the 4th ACM SIGCOMM
conference on Internet measurement (IMC '04), 2004, Pages 239-244.
[2] A. Slagell, J. Wang, and W. Yurcik, "Network log anonymization:
Application of crypto-pan to cisco netflows." In Proceedings of the
Workshop on Secure Knowledge Management , 2004.
[3] M. Bishop, R. Crawford, B. Bhumiratana, L. Clark, and K. Levitt,
"Some problems in sanitizing network data.", 15th IEEE International
Workshops on Enabling Technologies: Infrastructure for Collaborative
Enterprises, 2006., pp. 307-312.
[4] S.E. Coull, C.V. Wright, F. Monrose, M.P. Collins, and M.K. Reiter,
"Playing Devil's Advocate: Inferring Sensitive Information from
Anonymized Network Traces." In NDSS, 2007, vol. 7, pp. 35-47.
[5] B.F. Ribeiro, W. Chen, G. Miklau, and D.F. Towsley, "Analyzing
Privacy in Enterprise Packet Trace Anonymization." In NDSS, 2008.
[6] S. Gattani and T.E. Daniels, “Reference models for network data
anonymization”, In Proceedings of the 1st ACM workshop on Network
data anonymization (NDA '08), 2008, pp. 41-48.
[7] J. King, K. Lakkaraju, and A. Slagell. "A taxonomy and adversarial
model for attacks against network log anonymization." In Proceedings
of the 2009 ACM symposium on Applied Computing, 2009, pp. 1286-
1293.
[8] S.E. Coull, F. Monrose, M.K. Reiter, M. Bailey, "The Challenges of
Effectively Anonymizing Network Data," Conference For Homeland
Security, CATCH 2009, pp.230-236.
[9] M. Foukarakis, D. Antoniades, and M. Polychronakis, “Deep packet
anonymization”, In Proceedings of the Second European Workshop on
System Security (EUROSEC '09). ACM, 2009, pp. 16-21.
[10] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller,
"An overview of IP flow-based intrusion detection." Communications
Surveys & Tutorials, IEEE 12, no. 3, 2010, pp. 343-356.
[11] M. Burkhart, D. Schatzmann, B. Trammell, E. Boschi, and B. Plattner.
"The role of network trace anonymization under attack.", ACM
SIGCOMM Computer Communication Review 40, no. 1, 2010, pp. 5-
11.
[12] F. McSherry, and R. Mahajan, "Differentially-private network trace
analysis.", ACM SIGCOMM Computer Communication Review 41.4,
2011, pp. 123-134.
[13] F. McSherry, and R. Mahajan., "Differentially-private network trace
analysis.", ACM SIGCOMM Computer Communication Review 41, no.
4, 2011, pp. 123-134.
[14] R.R. Paul, V.C. Valgenti, M. Kim, "Real-time Netshuffle: Graph
distortion for on-line anonymization," Network Protocols (ICNP), 19th
IEEE International Conference on, 2011, pp.133,134.
[15] D. Riboni, A. Villani, D. Vitali, C. Bettini, L.V. Mancini, "Obfuscation
of sensitive data in network flows," INFOCOM, 2012 Proceedings,
IEEE, 2012, pp.2372-2380.
[16] W. Qardaji and L. Ninghui, "Anonymizing Network Traces with
Temporal Pseudonym Consistency." IEEE 32nd International
Conference on Distributed Computing Systems Workshops (ICDCSW),
2012, pp. 622-633.
[17] M. Mendonca, S. Seetharaman, and K. Obraczka, "A flexible in-network
ip anonymization service.", In Communications (ICC), 2012 IEEE
International Conference, pp. 6651-6656.
[18] S. Jeon, J-H. Yun, and W-N. Kim, “Obfuscation of Critical
Infrastructure Network Traffic using Fake Communication”, Annual
Computer Security Applications Conference (ACSAC) 2013, Poster.
[19] M. Nassar, B. al Bouna, and Q. Malluhi, "Secure Outsourcing of
Network Flow Data Analysis.", In Big Data (BigData Congress), 2013
IEEE International Congress, 2013, pp. 431-432.
[20] C. Xiaoyun, S. Yujie, T. Xiaosheng, H. Xiaohong, and M. Yan, "On
measuring the privacy of anonymized data in multiparty network data
sharing.", Communications, China 10, no. 5, 2013, pp. 120-127.
[21] Y-D. Ying-Dar, P-C. Lin, S-H. Wang, I-W. Chen, and Y-C. Lai.
"Pcaplib: A system of extracting, classifying, and anonymizing real
packet traces.", IEEE Systems Journal, Issue 99, pp.1-12.
[22] J. Stanek, L. Kencl, and J. Kuthan, "Analyzing anomalies in anonymized
SIP traffic.", In IEEE 2014 IFIP Networking Conference, 2014, 2014,
pp. 1-9.
[23] D. Riboni, A. Villani, D. Vitali, C. Bettini, L.V. Mancini, L.V,
"Obfuscation of Sensitive Data for Incremental Release of Network
Flows," IEEE Transactions on Networking, Issue 99, 2014, pp.1.
[24] T. Farah, and L. Trajkovic, "Anonym: A tool for anonymization of the
Internet traffic." In IEEE 2013 International Conference on Cybernetics
(CYBCONF), 2013, pp. 261-266.
[25] A.J. Slagell, K. Lakkaraju, and K. Luo, "FLAIM: A Multi-level
Anonymization Framework for Computer and Network Logs." In LISA,
vol. 6, 2006, pp. 3-8.
[26] J. Xu, J. Fan, M.H. Ammar, and Sue B. Moon, "Prefix-preserving ip
address anonymization: Measurement-based security evaluation and a
new cryptography-based scheme.", In 10th IEEE International
Conference on Network Protocols, 2002, pp. 280-289.
[27] M. Burkhart, D. Brauckhoff, M. May, and E. Boschi, "The risk-utility
tradeoff for IP address truncation." In Proceedings of the 1st ACM
workshop on Network data anonymization, 2008, pp. 23-30.
[28] W. Yurcik, C. Woolam, G. Hellings, L. Khan, B. Thuraisingham,
"Measuring anonymization privacy/analysis tradeoffs inherent to sharing
network data", IEEE Network Operations and Management Symposium,
2008, pp.991-994.
[29] V. Ciriani, S.D.C. Vimercati, S. Foresti, and P. Samarati, “Theory of
privacy and Anonymity”, In M. J. Atallah & M. Blanton (Eds.), In
Algorithms and theory of computation handbook, CRC Press, 2009, pp.
12
18-33.
[30] P. Samarati and L. Sweeney, “Protecting privacy when disclosing
information: k-anonymity and its enforcement through generalization
and suppression”, Technical Report SRI-CSL-98-04, SRI Computer
Science Laboratory, 1998
[31] L. Sweeney, “Achieving k-anonymity privacy protection using
generalization and suppression”, International Journal of Uncertainty
Fuzziness and Knowledge-Based Systems, 10(5), 2002, pp.571–588.
[32] T. Dalenius and S.P. Reiss, “Data-swapping: A technique for disclosure
control”, Journal of Statistical Planning and Inference, 6(1), 1982, pp.
73–85.
[33] J. Kim, “A Method For Limiting Disclosure in Microdata Based
Random Noise and Transformation”, In Proceedings of the Survey
Research Methods, American Statistical Association, Vol. A, 1986, pp.
370–374.
[34] J. Kim and W.E. Winkler, “Multiplicative Noise for Masking
Continuous Data”, Research Report Series, Statistics #2003-01,
Statistical Research Division. 2003, Washington, D.C. Retrieved from
http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf
[35] C. Dwork, “Differential Privacy”, In M. Bugliesi, B. Preneel, V.
Sassone, & I. Wegener (Eds.), Automata languages and programming,
Vol. 4052, 2006, pp. 1–12. Springer.
[36] K. Mivule, “An Investigation Of Data Privacy and utility using machine
learning as a gauge”, Dissertation, Computer Science Department,
Bowie State University, 2014, ProQuest No: 3619387.
[37] M.H. Dunham, “Data Mining Introductory and Advanced Topics”,
2003, pp. 58–60, 97–99. Upper Saddle River, New Jersey: Prentice Hall.
[38] K. Mivule, (2012). “Utilizing noise addition for data privacy, an
Overview”, In Proceedings of the International Conference on
Information and Knowledge Engineering (IKE), 2012, pp. 65–71.
[39] S.E. Coull, C.V. Wright, A.D. Keromytis, F. Monrose, and M.K. Reiter,
“Taming the devil: Techniques for evaluating anonymized network
data”, In Network and Distributed System Security Symposium, 2008,
pp. 125-135.

Contenu connexe

Tendances

Review on Key Based Encryption Scheme for Secure Data Sharing on Cloud
Review on Key Based Encryption Scheme for Secure Data Sharing on CloudReview on Key Based Encryption Scheme for Secure Data Sharing on Cloud
Review on Key Based Encryption Scheme for Secure Data Sharing on CloudIRJET Journal
 
Secure Encrypted Data in Cloud Based Environment
Secure Encrypted Data in Cloud Based EnvironmentSecure Encrypted Data in Cloud Based Environment
Secure Encrypted Data in Cloud Based Environmentpaperpublications3
 
m-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishingm-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data PublishingMigrant Systems
 
Data Anonymization for Privacy Preservation in Big Data
Data Anonymization for Privacy Preservation in Big DataData Anonymization for Privacy Preservation in Big Data
Data Anonymization for Privacy Preservation in Big Datarahulmonikasharma
 
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREMPRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREMIJNSA Journal
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...Editor IJMTER
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)ElavarasaN GanesaN
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningVrushali Malvadkar
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Databaseijdmtaiir
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudIOSR Journals
 
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...Migrant Systems
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Accessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentAccessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentIJNSA Journal
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...cscpconf
 
Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud
Oruta: Privacy-Preserving Public Auditing for Shared Data in the CloudOruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud
Oruta: Privacy-Preserving Public Auditing for Shared Data in the CloudMigrant Systems
 

Tendances (20)

J018116973
J018116973J018116973
J018116973
 
Review on Key Based Encryption Scheme for Secure Data Sharing on Cloud
Review on Key Based Encryption Scheme for Secure Data Sharing on CloudReview on Key Based Encryption Scheme for Secure Data Sharing on Cloud
Review on Key Based Encryption Scheme for Secure Data Sharing on Cloud
 
Secure Encrypted Data in Cloud Based Environment
Secure Encrypted Data in Cloud Based EnvironmentSecure Encrypted Data in Cloud Based Environment
Secure Encrypted Data in Cloud Based Environment
 
m-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishingm-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishing
 
Data Anonymization for Privacy Preservation in Big Data
Data Anonymization for Privacy Preservation in Big DataData Anonymization for Privacy Preservation in Big Data
Data Anonymization for Privacy Preservation in Big Data
 
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREMPRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)
Privacy Preserving in Cloud Using Distinctive Elliptic Curve Cryptosystem (DECC)
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Database
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
 
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...
Shared Authority Based Privacy-preserving Authentication Protocol in Cloud Co...
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
Accessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentAccessing secured data in cloud computing environment
Accessing secured data in cloud computing environment
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
 
Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud
Oruta: Privacy-Preserving Public Auditing for Shared Data in the CloudOruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud
Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud
 

En vedette

Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Kato Mivule
 
Top Firefox Addons
Top Firefox AddonsTop Firefox Addons
Top Firefox Addonstechnobz
 
Mechanical engineering
Mechanical engineeringMechanical engineering
Mechanical engineeringElavarasan S
 
Ttss consulting(1)
Ttss consulting(1)Ttss consulting(1)
Ttss consulting(1)Steven Trom
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
Lesson 7 world_history_medieval_period_new_
Lesson 7 world_history_medieval_period_new_Lesson 7 world_history_medieval_period_new_
Lesson 7 world_history_medieval_period_new_Anna Romana
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plansAji Subramanyan
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierKato Mivule
 
Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introductiongmesmatch
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Triphhfricke
 
춘천MBC 정보통신공사업 소개
춘천MBC 정보통신공사업 소개춘천MBC 정보통신공사업 소개
춘천MBC 정보통신공사업 소개chmbc
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовarsney
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013Jennifer L. Scheffer
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displayssteveschrab
 

En vedette (20)

Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
 
Top Firefox Addons
Top Firefox AddonsTop Firefox Addons
Top Firefox Addons
 
AM01PRO
AM01PROAM01PRO
AM01PRO
 
Mechanical engineering
Mechanical engineeringMechanical engineering
Mechanical engineering
 
Ttss consulting(1)
Ttss consulting(1)Ttss consulting(1)
Ttss consulting(1)
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
Lesson 7 world_history_medieval_period_new_
Lesson 7 world_history_medieval_period_new_Lesson 7 world_history_medieval_period_new_
Lesson 7 world_history_medieval_period_new_
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plans
 
About P&T
About P&TAbout P&T
About P&T
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
 
Presentt
PresenttPresentt
Presentt
 
Oumh1103 bab 4
Oumh1103 bab 4Oumh1103 bab 4
Oumh1103 bab 4
 
Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introduction
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip
 
춘천MBC 정보통신공사업 소개
춘천MBC 정보통신공사업 소개춘천MBC 정보통신공사업 소개
춘천MBC 정보통신공사업 소개
 
Burton Industries ppt 2012
Burton Industries ppt 2012Burton Industries ppt 2012
Burton Industries ppt 2012
 
Vocab dict
Vocab dictVocab dict
Vocab dict
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторов
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displays
 

Similaire à A Study of Usability-aware Network Trace Anonymization

IRJET- A Survey for an Efficient Secure Guarantee in Network Flow
IRJET-  	  A Survey for an Efficient Secure Guarantee in Network FlowIRJET-  	  A Survey for an Efficient Secure Guarantee in Network Flow
IRJET- A Survey for an Efficient Secure Guarantee in Network FlowIRJET Journal
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Miningijujournal
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Miningijujournal
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining TechniquesIJMER
 
journal for research
journal for researchjournal for research
journal for researchrikaseorika
 
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYUSE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYIJMIT JOURNAL
 
Use of network forensic mechanisms to formulate network security
Use of network forensic mechanisms to formulate network securityUse of network forensic mechanisms to formulate network security
Use of network forensic mechanisms to formulate network securityIJMIT JOURNAL
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET Journal
 
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...IJECEIAES
 
Securing Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewSecuring Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewIRJET Journal
 
A Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningA Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningIJNSA Journal
 
A Brief Survey on Various Technologies Involved in Cloud Computing Security
A Brief Survey on Various Technologies Involved in Cloud Computing SecurityA Brief Survey on Various Technologies Involved in Cloud Computing Security
A Brief Survey on Various Technologies Involved in Cloud Computing SecurityAssociate Professor in VSB Coimbatore
 
Multipath Dynamic Source Routing Protocol using Portfolio Selection
Multipath Dynamic Source Routing Protocol using Portfolio SelectionMultipath Dynamic Source Routing Protocol using Portfolio Selection
Multipath Dynamic Source Routing Protocol using Portfolio SelectionIRJET Journal
 
Efficient technique for privacy preserving publishing of set valued data on c...
Efficient technique for privacy preserving publishing of set valued data on c...Efficient technique for privacy preserving publishing of set valued data on c...
Efficient technique for privacy preserving publishing of set valued data on c...ElavarasaN GanesaN
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...AIRCC Publishing Corporation
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...ijcsit
 
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...ijccsa
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 

Similaire à A Study of Usability-aware Network Trace Anonymization (20)

IRJET- A Survey for an Efficient Secure Guarantee in Network Flow
IRJET-  	  A Survey for an Efficient Secure Guarantee in Network FlowIRJET-  	  A Survey for an Efficient Secure Guarantee in Network Flow
IRJET- A Survey for an Efficient Secure Guarantee in Network Flow
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining Techniques
 
journal for research
journal for researchjournal for research
journal for research
 
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYUSE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
 
Use of network forensic mechanisms to formulate network security
Use of network forensic mechanisms to formulate network securityUse of network forensic mechanisms to formulate network security
Use of network forensic mechanisms to formulate network security
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
 
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
 
Securing Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewSecuring Cloud Using Fog: A Review
Securing Cloud Using Fog: A Review
 
A Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningA Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved Mining
 
A Brief Survey on Various Technologies Involved in Cloud Computing Security
A Brief Survey on Various Technologies Involved in Cloud Computing SecurityA Brief Survey on Various Technologies Involved in Cloud Computing Security
A Brief Survey on Various Technologies Involved in Cloud Computing Security
 
Multipath Dynamic Source Routing Protocol using Portfolio Selection
Multipath Dynamic Source Routing Protocol using Portfolio SelectionMultipath Dynamic Source Routing Protocol using Portfolio Selection
Multipath Dynamic Source Routing Protocol using Portfolio Selection
 
Efficient technique for privacy preserving publishing of set valued data on c...
Efficient technique for privacy preserving publishing of set valued data on c...Efficient technique for privacy preserving publishing of set valued data on c...
Efficient technique for privacy preserving publishing of set valued data on c...
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
 
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...
SECURITY AND PRIVACY AWARE PROGRAMMING MODEL FOR IOT APPLICATIONS IN CLOUD EN...
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
Hy3414631468
Hy3414631468Hy3414631468
Hy3414631468
 

Plus de Kato Mivule

Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialKato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...Kato Mivule
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Utilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewUtilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewKato Mivule
 

Plus de Kato Mivule (18)

Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Utilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewUtilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an Overview
 

Dernier

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Dernier (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

A Study of Usability-aware Network Trace Anonymization

  • 1. 1 A Study of Usability-aware Network Trace Anonymization Kato Mivule Los Alamos National Laboratory Los Alamos, New Mexico, USA kmivue@gmail.com Blake Anderson Los Alamos National Laboratory Los Alamos, New Mexico, USA banderson@lanl.gov Abstract— The publication and sharing of network trace data is a critical to the advancement of collaborative research among various entities, both in government, private sector, and academia. However, due to the sensitive and confidential nature of the data involved, entities have to employ various anonymization techniques to meet legal requirements in compliance with confidentiality policies. Nevertheless, the very composition of network trace data makes it a challenge when applying anonymization techniques. On the other hand, basic application of microdata anonymization techniques on network traces is problematic and does not deliver the necessary data usability. Therefore, as a contribution, we point out some of the ongoing challenges in the network trace anonymization. We then suggest usability-aware anonymization heuristics by employing microdata privacy techniques while giving consideration to usability of the anonymized data. Our preliminary results show that with trade-offs, it might be possible to generate anonymized network traces with enhanced usability, on a case-by-case basis using micro-data anonymization techniques. Keywords—Network Trace Anonymization; Usability; Differential Privacy; K-anonymity; Generalization I. INTRODUCTION While a number of network trace anonymization techniques have been presented in literature, data utility remains problematic due to the unique usability requirements by the different consumers of the privatized network traces. Yet still, a number of microdata privacy techniques from the statistical and computation sciences, are difficult to implement when anonymizing network traces due to the low usability of results. Moreover, finding the right proportionality between anonymization and data utility of network trace data is intractable and requires trade-offs on a case-by-case basis, after a careful consideration of privacy needs stipulated by policy makers, and likewise the usability requirements of the researchers, who in this case, are the consumers of the anonymized data. Furthermore, a generalized approach fails to deliver unique solutions, as each entity will have unique data privacy requirements. In this study, we take a look at the structure of the network trace data. We vertically partition the network trace data into different attributes and apply micro- data privatization techniques separately for each attribute. We then suggest usability-aware anonymization heuristics for the anonymization process. While a number of anonymization attacks have been presented in literature, the main goal of this study was generation of anonymized network traces with better data usability. Therefore, the focus of the suggested heuristics and preliminary results, is about the generation of anonymized usability-aware network trace data, using privacy techniques covered in the statistical disclosure control domain; that include the following: Generalization, Noise addition and Multiplicative noise perturbation, Differential Privacy, and Data swapping [38]. A measure of usability by quantifying descriptive and inference statistics of the anonymized data in comparison with that of the original data is also presented. Furthermore, we apply frequency distribution analysis and unsupervised learning techniques in the measure of usability for the unlabeled data. The rest of the paper is organized as follows: In Section II, we present a review of related work, and definition of important terms pertaining to this paper. In Section III, we present methodologies and usability-aware anonymization heuristics. In Section IV, the experiment and results are given. Finally in Section V, the conclusion, recommendations, and future works are presented. II. RELATED WORK One of the challenges of anonymizing network traces, is how to keep the structure and flow of the data intact so as to provide usability to the consumer of the anonymized data. In such efforts, Maltz et al. (2004) demonstrated that network trace data could be anonymized while preserving the structure of the original data [1]. Additionally, Maltz et al. (2004) observed and noted that some of the challenges in anonymizing network traces included figuring out attributes in the network trace that could leak sensitive information, and how to anonymize the data such that the original configurations are preserved [1]. Observations by Maltz et al. are still relevant today, especially when considering the intractability between privacy and usability. On the other hand, Slagell, Wang, and Yurcik (2004) proposed Crypto-Pan, a network trace anonymization tool that employs cryptographic techniques in the privatization of network trace data [2]. While anonymization using cryptographic means might be effective in concealing sensitive data, usability of the anonymized data is always a challenge. Bishop, Crawford, Bhumiratana, Clark, and Levitt (2006), observed that one of
  • 2. 2 the problems in the anonymization of network traces, is that when handling IP addresses, the set of available addresses is finite, thus setting a limit to any anonymization prospects [3]. Each octet in the IP address would handle a range of 0 to 255. For instance, it would not make much sense to have an anonymized IP address with an octet value of 345. This limitation makes the data vulnerable to de-anonymization attacks. On the issue of de-anonymization attacks, Coull, Wright, Monrose, Collins, and Reiter (2007) presented inference techniques for de-anonymizing and detecting network topologies in anonymized network trace data [4]. Coull et al. showed that topological data could be deduced as an artifact of functional network packet traces, if the data on activity of hosts can be utilized as an advantage to prevent a successful obfuscation of the network traces [4]. Moreover, Coull et al., pointed out that obfuscating network trace data is not a trivial task as publishers of the data need to be aware of the tension between balancing privacy and data utility needs for anonymized network traces [4]. Additionally, Ribeiro, Chen, Miklau, and Towsley (2008), showed that systematic attacks on prefix-preserving anonymized network traces, could be done by adversaries using modest amount of publicly available information about a network and employing attack techniques such as finger printing [5]. However, Ribeiro et al. anticipated that their proposed attack methodologies would be employed in evaluating worst-case vulnerabilities and finding trade-offs between privacy and utility in prefix-preserving privatization of network traces [5]. Therefore, while researchers might have an interest in anonymized data sets that maintain the structure and flow of the original data, curators of that data have to contend with the fact that such prefix-preserving anonymization is subject to de- anonymization attacks. A comprehensive reference model was presented by Gattani and Daniels (2008), in which they outlined that entities needed to formulate the problem of anonymizing network traces [6]. Gattani and Daniels (2008) noted that the anonymization procedure always aims at the following three goals [6]: (i) defending the confidentiality of users, (ii) obfuscating the inner structure of a network, and (iii) generating anonymized network traces with acceptable levels of usability [6]. However, Gattani and Daniels (2008) observed that attaining those three anonymization goals is problematic, as removing too much sensitive information from a network data trace only reduces the usability of the anonymized network traces [6]. Additionally, Gattani and Daniels (2008), categorized attacks on anonymized data categorized as, (i) active data injection attacks, (ii) known mapping attacks, (iii) network topology inference attacks, and (iv) cryptographic attacks [6]. On the categorization of attacks, King, Lakkaraju, and Slagell (2009) presented a taxonomy of attacks on anonymization techniques with the aim of helping curators of the privatization process negotiate trade-offs between data utility and anonymization [7]. King et al., classified attacks on anonymization methods as (i) fingerprinting, (ii) structure recognition, (iii) known mapping, (iv) data injection, and (v) cryptographic attacks [7]. A combined categorization of attacks on anonymization techniques, from Gattani and Daniels, and King et al., would then be listed as follows [7] [6]: (i) Fingerprinting attacks: in this this category of attacks, attributes of anonymized data are compared with traits of known network structures to uncover a relationship between the anonymized and non-anonymized data. (ii) Data injection attacks: in this type of exploit, an attacker injects pseudo-traffic data in a network trace before anonymization process and uses the pseudo-traffic traces to de-anonymize the network traces and network structure. (iii) Structure recognition attacks: in this type of exploit, an attacker seeks to determine the structure between objects in the anonymized data to discover multiple relations between anonymized and non-anonymized data. (iv) Network topology inference: similar to known mapping attacks, this category of exploits seeks to retrieve the network topology map by de- anonymizing the nodes that make up the vertices of the network, the edges between the nodes that represent the connectivity and the routers. (v) Known mapping attacks: in this category of exploit, the attacker relies on external data (auxiliary data) to find a mapping between the anonymized network trace data and the original network trace data in order to retrieve the original IP addresses. (vi) Cryptographic attacks: in this category of attacks, exploits are carried out to break cryptographic algorithms used to encrypt the network traces. A comparative analysis was done by Coull, Monrose, Reiter, and Bailey (2009) in which they pointed out the similarities and differences between network data anonymization and microdata privatization techniques, and how microdata obfuscation methods could be applied to anonymize network traces [8]. Coull, et al. observed that uncertainties did exist about the effectiveness of network data anonymization, from both methodological and policy view, with the research community in need for more study to understand the implications of publishing anonymized network data and the utility of such data to researchers [8]. Furthermore, Coull, et al. suggested that the extensive work that exists in the statistical disclosure control discipline could be employed by the network research community towards the privatization of network flow data [8]. On network trace packet anonymization, Foukarakis, Antoniades, and Polychronakis (2009), proposed the anonymization of network traces at the packet level – in the payload of a packet, due to inadequacies found in various network trace anonymization techniques [9]. Foukarakis et al., suggested identifying revealing information contained in the shell-code of code injection attacks, and anonymizing such packets to grant confidentiality in published network attack traces [9]. However, on the subject of IP-flow intrusion detection methods, Sperotto et al. (2010) presented an overview of IP-flow intrusion detection approach and highlighted the classification of attacks, and defense methods and how flow-based method can be used to discover scans, worms, botnets and denial of service (DoS) attacks [10]. Furthermore Sperotto et al. highlighted two types of sampling; packet sampling whereby a packet is deterministically chosen based on a time interval for analysis; and flow sampling in which a sample flow is chosen for analysis [10]. At the same
  • 3. 3 time, Burkhart et al. (2010), in their review of anonymization techniques, showed that current anonymization techniques are vulnerable to a series of injection attacks, by inserting attacker packets into the network flow prior to anonymization, then later retrieving the packets, thus revealing vulnerabilities and patterns in the anonymized data [11]. As a mitigation to injection attacks, Burhart et al. suggested that anonymization of network flow data should be done as part of a comprehensive approach including both legal and technical perspectives on data confidentiality [11]. Meanwhile, McSherry and Mahajan (2011) showed that differential privacy could be employed to anonymize network trace data. Yet despite privacy guarantees provided by differential privacy, the usability of the privatized data remains a challenge due to excessive noise from the anonymization [12]. However, McSherry, Frank, and Mahajan (2011), in their study of applying differential privacy on network trace data, acknowledged the challenges of balancing usability and privacy, despite the confidentiality assurances accorded by differential privacy [13]. On real time interactive anonymization, Paul, Valgenti, and Kim (2011) proposed the Real-time Netshuffle anonymization technique whereby distortion is done to a complete graph to prevent inference attacks in network traffic [14]. Netshuffle works by employing k-anonymity methodology on network traces, by ensuring that all trace records appear at least k>1, with k being the anonymized record, and then shuffling gets applied on the k- anonymized records, making it difficult for an attacker to decipher due to the distortion [14]. A network trace obfuscation methodology, (k, j)-obfuscation, was proposed by Riboni, Villani, Vitali, Bettini, and Mancini (2012), in which a network flow is considered obfuscated if it cannot be linked with greater assurance, to its source and destination IP addresses [15]. Riboni, et al. observed from their implementation of (k, j)-obfuscation, that the large set of network flows maintained the utility of the original network trace [15]. However, the context of data utility remains challenging as each consumer of privatized data will have unique usability requirements, different levels of needed assurance, and therefore, utility becomes constrained to a case-by-case basis, depending on an entity's privacy and usability needs. On the issue of preserving IP consistency in anonymized data, Qardaji and Li (2012) observed that full prefix-preserving IP anonymization suffers from a number of attacks yet from a usability perspective, some level of consistency is required in anonymized IP addresses [16]. To mitigate this problem, Qardaji and li (2012), proposed maintaining pseudonym consistency by dividing flow data into buckets based on temporal closeness and separately privatize flows in each bucket, thus maintaining consistency only in each bucket but not globally across all buckets [16]. Mendonca, Seetharaman, and Obraczka (2012) proposed AnonyFlow, an interactive anonymization technique that provides end point privacy by preventing the tracking of source behavior and location in network data [17]. However, Mendonca et al. acknowledged that AnonyFlow does not address issues of complete anonymity, data security, steganography, and network trace anonymization in non- interactive settings [17]. On generating synthetic network traces, Jeon, Yun, and Kim (2013), proposed an anomaly-based intrusion detection system (A-IDS) to generate pseudo-network traffic for the obfuscation of real sensitive network traffic in supervisory control and data acquisition (SCADA) systems [18]. An overview of network data anonymization was presented by Nassar, al Bouna, Malluhi (2013), in which the need to address the problem of finding appropriate anonymization algorithms that grant privacy but with an optimal risk-utility trade-off, was highlighted [19]. On using entropy and similarity distance measures, Xiaoyun, Yujie, Xiaosheng, Xiaohong, and Yan (2013) employed similarity distance and entropy techniques in the quantification of anonymized network trace data [20]. Xiaoyun et al. proposed two types of similarity measures: (i) external similarity, in which the distance measurements are done to compute the probability that an adversary will obtain a one-to-one mapping relation between the anonymized and the original data, based on auxiliary knowledge; (ii) Internal similarity, in which distance measurements are done on the privatized and the original data to indicate how distinguishable or indistinguishable the data sets are [20]. On the extracting, classification, and anonymization of packet traces, Lin, Lin, Wang, Chen, and Lai (2014), observed that capturing and sharing real network traffic faced two challenges, first various protocols are associated with the packet traces and secondly, such packet traces tend not to be well classified before deep packet anonymization [21]. Therefore, Lin et al. proposed PCAPLib methodology to extract, classify, and the deep packet anonymization of packet traces [21]. In their work on Session Initiation Protocol (SIP) used in multimedia communication sessions, Stanek, Kencl, and Kuthan (2014), pointed out that current network trace anonymization techniques are insufficient for SIP traces due to the data format of the SIP trace, that includes, the IP address, the SIP URI, and the e- mail address [22]. To mitigate this problem, Stanek et al, proposed SiAnTo, an anonymization methodology that replaces SIP information with non-descriptive but matching labels [22]. Of recent, Riboni, Villani, Vitali, Bettini, and Mancini (2014), cautioned that current network trace anonymization techniques are vulnerable to various attacks while at the same time it is problematic to apply microdata privatization methods in obfuscating network traces [23]. Moreover, Riboni et al. noted that current obfuscation methods depend on assumptions about an adversary intentions, which are challenging to model, and do not guarantee privacy against background knowledge attacks [23]. In Table I, is a summary of some of network trace anonymization challenges outlined in literature for the past ten years. A. Network trace anonymization techniques In the following section, a review of some of the common network trace anonymization techniques is presented [24] [25] [26] [27] [28] [16]: (i) Black marker technique: in this
  • 4. 4 method, sensitive values are erased or substituted with fixed values. TABLE I. SUMMARY OF NETWORK TRACE ANONYMIZATION CHALLENGES Author (s) Network Trace Anonymization Challenges Maltz et al., (2004) Challenge of identifying attributes to anonymize while conserving usability Slagell et al., (2004) Crypto-Pan – cryptography to anonymize IP addresses – usability a challenge. Bishop et al.,(2006) Anonymization of IP addresses problematic – set of IP addresses is finite. Coull et al.,(2007) Obfuscation not trivial task due to the tensions between privacy and usability. Ribeiro et al., (2008) Prefix-preserving anonymized data subject to Fingerprinting attacks. King et al., (2009) Taxonomy of attacks on anonymization technique – anonymization challenges. Coull et al., (2009) Comparison between network and micro data anonymization – significant differences. Foukarakis et al., (2009) Network trace anonymization at the packet level – a challenge. Burkhart et al, (2010) Injection attacks on anonymized network trace data. McSherry and Mahajan (2011) Differential privacy anonymization of network trace data. Paul, Valgenti, and Kim (2011) Real-time anonymization with k-anonymity. Riboni et al., (2012) (k, j)-obfuscation – network flow is obfuscated if it cannot be linked to original data with greater assurance Qardaji and Li (2012) Global Prefix Consistency is subject to attacks. Mendonca et al., (2012) Interactive network trace anonymization. Jeon, Yun, and Kim (2013) Synthetic (anonymized) network trace data generation. Nassar, al et al., (2013) Balance between utility and privacy needed - still a problem. Farah and Trajkovic (2013) Network trace anonymization techniques - an overview. Stanek et al., (2014) Proposed Session Initiation Protocol (SIP) anonymization and challenges. Riboni et al., (2014) Caution with current network anonymization techniques – vulnerable to attacks (ii) Enumeration technique: in this scheme, sensitive values in a sequence are replaced with an ordered sequence of synthetic values. (iii) Hash technique: unique values are substituted with a fixed size bit string in the hash technique. (iv) Partitioning technique: with the partitioning method, revealing values are partitioned into a subset of values and each of the values in the subset is replaced with a generalized value. For example, an IP address 141.121.10.12, could be partitioned into four octets and the last two octets replaced with zero values, 141.121.0.0. (v) Precision degradation technique: highly specific values of a time-stamp attribute are removed when employing the precision degradation method. (vi) Permutation technique: A random permutation is done to link non-anonymized IP and MAC addresses to a set of available addresses. (vii) Prefix-preserving anonymization technique: in this technique, values of an IP address are replaced with synthetic values in such a way that the original structure of the IP address is kept – the prefix values of an IP address structure is preserved. Prefix-preservation could be applied fully or partially on the IP address. The fully prefix- preserving anonymization will map the full structure of the original IP address in the anonymized data, while the partially prefix-preserving anonymization will preserve a select structure of the original IP address, for example the first two octets. (viii) Random time shift technique: this methodology works by applying a random value as an offset to each value in the field. (ix) Truncation technique: with this technique, part of the IP or MAC address is suppressed or deleted and the remaining IP address remains intact. (x) Time unit annihilation: In this partitioning anonymization methodology, part of the time-stamp is deleted and replaced with zeros. In Table 1, a summary of ongoing challenges from literature, on anonymizing network traces is given. Although a number of network trace anonymization solutions have been proposed in literature, usability of the anonymized data remains problematic. While a number of challenges exist, this study labored to focus on the challenge of usability-aware anonymization of network traces. B. Statistical disclosure control techniques The following are some of the main microdata privatization methods used: Suppression: in this technique, revealing and sensitive data values are deleted from a data set at the cell level [29]. Generalization: to achieve confidentiality for revealing values in an attribute, a single value is allocated to a group of sensitive values in the attribute [30]. K-anonymity: in this method, data privacy is enforced by requiring that all values in the quasi-attributes be repeated k times, such that k >1, thus providing confidentiality and making it harder to uniquely distinguish individuals values. K-anonymity employs both generalization and suppression methods to achieve k >1 [31]. Data swapping: Data swapping is a data privacy technique that involves exchanging sensitive cell values with other cell values in the same attribute while keeping intact the frequencies and statistical traits of the original data, and as such, making it difficult for an attacker to map the privatized values to the original record [32]. Noise addition: noise addition is a data privacy method that adds random values (noise) to revealing and sensitive numerical values, in the original data, to ensure confidentiality. The random values are usually chosen from between the mean and standard deviation of the original values [33]: 𝑋! + 𝜀! = 𝑍! (1) Multiplicative noise: similar to noise addition, random values generated between the mean and variance of the original data
  • 5. 5 values, are then multiplied to the original data generating a privatized data set [34]. 𝑋! ∗ 𝜀! = 𝑍! (2) Where X = original data, Z = privatized data, and ε = the random values. Differential Privacy: Similar to noise addition, differential privacy imposes privacy by adding Laplace noise to query results from the database such that it cannot be distinguished if a particular value has been adjusted in that database or not; making it more difficult for an attacker to decode items in the database [35]. ε-differential privacy is satisfied if the results to a query run on database D1 and D2 should probabilistically be similar, and meet the following condition [35]: 𝑃 𝑞! 𝐷! ∈ 𝑅 𝑃 𝑞! 𝐷! ∈ 𝑅 ⩽ 𝑒! (3) Where D1 and D2 are the two databases; P is the probability of the perturbed query results D1 and D2; qn() is the privacy granting procedure (perturbation); qn(D1) is the privacy granting procedure on query results from database D1; qn(D2) is the privacy granting procedure on query results from database D2; R is the perturbed query results from the databases D1 and D2 respectively; eε is the exponential e epsilon value. Differential privacy can be implemented as follows [36]: (i) Run query on database 𝑤ℎ𝑒𝑟𝑒𝑓 𝑥 = 𝑞𝑢𝑒𝑟𝑦𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (ii) Calculate the most influential observation 𝛥𝑓 = 𝑀𝑎𝑥 𝑓 𝐷! − 𝑓 𝐷! (4) (iii) Calculate the Laplace noise distribution 𝑏 = 𝛥 𝑓 𝜀 (5) (iv) Add Laplace noise distribution to the query results 𝐷𝑃 = 𝑓 𝑥 + 𝐿𝑎𝑝𝑎𝑙𝑐𝑒 0, 𝑏 (6) (v) Publish perturbed query results in interactive (query responses) or non-interactive (macro, micro data) mode. C. Metrics used to quantify usability in this study The Shannon entropy: entropy is used essentially to measure the amount of randomness and uncertainty in a data set; if all values in a set of information fall into one category, then entropy in such cases is at zero. Probability is used to quantify randomness of elements in an information set; normalized entropy values range from 0 to 1, getting to the upper bound level when all probabilities are equal [37] [36]. Entropy is formally described using the following formula [37]: 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = 𝐻 𝑝!, 𝑝!, . . . , 𝑝! = 𝑝! ! !!! ⋅ 𝑙𝑜𝑔 ! !! (7) where pi = probability; H(p1, p2,...,pn) is entropy for each pi. Correlation Metric (between Original data X and Privatized data Z): Correlation rxz computes the inclination and tendency of an additive linear relation between two data points; the correlation is dimensionless, independent of the environs in which the data points x and y are measured and is expressed as follows [38]: 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑟!" = 𝐶𝑜𝑣 𝑥𝑧 𝜎! 𝜎! (8) Where Cov xz is the covariance product of X and Z, sigma (σ) represents the standard deviation product of X and Z. If rxz = - 1, then a negative linear relation exists between X and Z; if rxz = 0, no linear relation exits between X and Z; when rxz = 1, a strong linear relation between X and Z exists. Descriptive Statistics Metric: Descriptive statistics (DS) such as the mean, standard deviation, variance, etc., are used in quantifying how much distortion there is between the anonymized and original data. The larger the difference, the more privacy but also an indication of less usability; the closer the difference, the more usability but perhaps less privacy. The format used in the quantification is always in the form [36]: 𝑈𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝐷𝑆(𝑍) − 𝐷𝑆(𝑋) (9) Where Z is the anonymized data, X is the original data, and DS, the descriptive statistics. Distance Measures Metric (Euclidean Distance): For distance measures, we employed clustering with k-means to evaluate how well the clustering of the original data compares with that of the anonymized data. In this case, the Euclidean Distance is used for k-means clustering and is expressed as follows [39]: 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥, 𝑦 = 𝑥! − 𝑦! !! !!! (10) The Davis Bouldin index: was also used in the evaluation of how well the clustering performed. The Davis Bouldin Index (DBI) is expressed as follows [21]: 𝐷𝐵𝐼 = ! ! 𝐷! ! !!! (11) Where 𝐷! ≡ 𝑚𝑎𝑥 !:!!! 𝑅!,! (12) And 𝑅!,! ≡ !!!!! !!,! (13) And Ri,j is a quantification of how good the clustering is. Si and Sj is the distance within each cluster. Mi,j is the distance between clusters. Classification Error Metric: With the classification error test, both the original and anonymized data are passed through machine learning and the classification error (or accuracy) is returned. The classification error (CE) of the anonymized data is subtracted from that of the original.
  • 6. 6 The larger the difference, the more privacy (due to distortion); this might be an indication of low usability. However, a smaller difference might indicate better usability but then low privacy, as anonymized results might be closer to the original data in similarity. Depending on the machine-learning algorithm used, the classification error metric will be in this form [36]: 𝑈𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝐺𝑎𝑢𝑔𝑒 = 𝐶𝐸 𝑍 − 𝐶𝐸 𝑋 (14) Where Z is the anonymized data, X the original data, and CE is the classification error. III. METHODOLOGY In this section, we describe the implemented methodology; in this case, heuristics used in the anonymization of network trace data, within the context of usability while at the same time granting privacy requirements. The goal of the heuristics is to provide an anonymized data that could be used by researchers with close statistical traits to the original data. The trade-off in this case, is that we tilt towards more utility while making it harder for an attacker to decrypt the original data, assuming that the attacker has no prior knowledge. Because of the unique data structure of network traces, a single generalized approach is not applicable in anonymizing all the network trace attributes. In our approach, we apply a hybrid of anonymization heuristics for each group of related attributes. Combinations of microdata anonymization techniques were used in this study, as illustrated in Figure 1. The following attributes were anonymized in the network trace data: (i) Start and End Time (Time-stamp), (ii) Source IP and Destination IP, (iii) Protocol, (iv) Source Port and Destination Port, (v) Source Packet Size and Destination Packet Size, (vi) Source Bytes and Destination Bytes, (vii) TOS Flags. However, due to space constraints, we only present results for the Timestamp and IP Address attributes. Figure 1: An illustration of the proposed anonymization heuristics for the network trace data. A. Enumeration with multiplicative pertubation To preserve the flow structure of the timestamp, we employed enumeration with multiplicative perturbation, a heuristic that combines multiplicative noise addition technique from the microdata privatization techniques and enumeration from network trace anonymization. The Enumeration with Multiplicative Perturbation Heuristic is implemented as follows: Step (I): A small epsilon constant value is chosen between 0 and 1. Data curators could conceal this random value, arbitrarily chosen between 0 and 1, as an additional layer of confidentiality. Step (ii): The small epsilon constant value is then multiplied to the original data (timestamp, both Start and End Time attributes) generating an enumerated set. Step (iii): The generated enumerated data is then added to the original data, producing an anonymized data set. Step (iv): A test for usability is done, using descriptive statistical analysis, entropy, correlation, and unsupervised learning using clustering (k-means). Step (v): If the desired threshold is met, the anonymized data is published. The goal with this heuristic is to keep the time flow structure intact and similar to the original data while at the same time anonymizing the time series values. In this case, the anonymized time series data should generate similar usability results to the original. B. Generalization and differential privacy The IP address is one of the most challenging attributes to anonymize since each octet of the IP address is limited to a finite set of numbers, from 0 to 255. This makes the IP address attribute vulnerable to attackers in attempts to de-anonymize the privatized network trace [3]. With such restrictions, the curator of the data is left with the choice of completely anonymizing the IP address by employing full perturbation techniques, which in turn keeps the flow structure and prefix of the IP address distorted, and thus poor data usability. One solution to this problem would be to employ heuristics that would grant anonymization and at the same time keep the prefix of the IP address intact. However, full IP address prefix-preserving anonymization has been shown to be prone to de-anonymization attacks, yet presenting another challenge [5]. Therefore, to deal with this problem, we suggest a partial prefix-preserving heuristic in which differential privacy and generalization are used and implemented as follows: Octet 1, anonymization: The IP address is split into four octets. Generalization is applied to the first octet to partially preserve
  • 7. 7 the prefix of the anonymized IP address. The goal is to give the users of the anonymized data some level of usability by being able get a synthetic flow of the IP address structure in the network trace. Step (i): A small epsilon constant value is chosen and used for application (added or multiplied to data) with noise addition or multiplicative noise on the first octet. The goal is to preserve the flow structure in the first octet. Step (ii): Frequency count analysis to check that none of the first octet values from the original data re-appear in the anonymized data is done at this stage. Step (iii): If first octet values reappear in the anonymized data, generalization by replacing the reappearing values with the most frequent values in the anonymized first octet is done. Step (iv): Finally, generalization and k-anonymity are applied to ensure no unique values appear, and that all values in the first octet appear k >1. Step (v): A test for usability by comparing the original and anonymized first octet values, is done. Octet 2, 3, and 4 anonymization: To make it difficult to de-anonymize the full IP address, randomization using differential privacy is applied to the remaining three octets. However, since each octet is limited to a set of 0 to 255 finite numbers, the differential privacy perturbation process will generate some values that would exceed 255; for instance, it would not make meaning to have an octet value of 350. To mitigate this situation, a control statement is introduced at the end of the differential privacy process, to exclude all values greater than an IP class address and octet range. In this case, any values greater than 255 are excluded from the end results of the perturbation process. Differential privacy is applied to each of the three octets vertically and separately. Step (i): A vertical split of octet 2, 3, and 4 into separate attributes, is done. Step (ii): Anonymization using differential privacy on each attribute (octet) separately is done at this stage. Step (iii): Test to ensure that anonymized values in each octet are in range, from 0 to 255. Step (iv): If the anonymized values in an octet exceed the 0 to 255 range then return a generalized value using the most frequent value in that 0 to 255 range. Step (v): Test for usability. Step (vi): Combine all octets to a full anonymized IP address. IV. RESULTS Preliminary results are presented in this section. However, due to space limitation in this publication, only results for the timestamp and IP address attributes are presented. Real 2014 network trace (NetFlow) data provided by Los Alamos National Laboratory were used in this experiment. A total of 500000 network flow records were anonymized in this study. Microdata obfuscation techniques were applied for the anonymization process. Each attribute of the NetFlow trace was anonymized separately. A. Timestamp anonymization and usability results Descriptive statistical analysis was done on both the original and anonymized data sets, as shown in Table II. The aim was to study the statistical traits of both the original and anonymized data sets and show any similarities. In this case, the statistical traits of the anonymized data show an augmentation of the original data – a generation of a synthetic data set in this case. For instance the original mean of the start time and end time was 1123355142 and 1123355214 respectively, and while that of the anonymized data set was at 1944808589 and 1944808714. The difference between the anonymized and original data was at 821453447 and 821453500 respectively. A larger difference might indicate more privacy and less usability, while a smaller difference might indicate better privacy but less usability. The results presented in Table II indicate a mid-way with both privacy and usability needs met after trade-offs (the difference). TABLE II. STATISTICAL TRAITS OF ORIGINAL AND ANONYMIZED TIMESTAMP DATA However, to meet the requirements of different users for the anonymized data, a fine-tuning of the parameters in the anonymization heuristics would need to be done. Additionally, the normalized Shannon's entropy results, as shown in Table II, were similar for both original and anonymized data at approximately 0.77 and 0.76 for the start and end times respectively. The entropy results indicate that the distortions and uncertainty in both data sets might be similar. While the entropy results might be good for usability, it could likewise be argued that privacy levels might be inadequate since the two data sets are similar in that regard. However, the correlation values between the anonymized and original data was at 0.532 and 0.534 for the start and end time attributes respectively. The results could indicate that while correlations exist between the two data sets, the significance is not that high since the values do not approach 1. Figure 2: K-means clustering results for the original start and end time data.
  • 8. 8 The results might indicate that privacy is maintained in the anonymized data, with an acceptable level of usability. In Figure 2, results from clustering the original network trace data (timestamp attribute), is presented. The x-axis in Figure 2 represents the start-time, while the y-axis represents the end- time of the activity in the network trace. The value of k for the k-means was set to 5 in this experiment. From an anecdotal point of view, we can see that the clustering results in Figure 2 have their own skeletal structure. However, this is not the case in Figure 3. In Figure 3, data privacy using noise addition was applied idealistically, without much consideration given to the issue of usability. Figure 3: Idealistic Privacy application and clustering results An anecdotal view of results in Figure 3 might point to better privacy, since the skeletal cluster structure of the original data was dismantled and replaced with a new skeletal cluster structure. Figure 4: K-means clustering for the anonymized start and end-time data. However, usability remains a challenge, as the anonymized clustering results are far from being close to the original clustering. In the case of this study, the aim was to obtain clustering results with better usability. Therefore, a re-tuning of the parameters in the data privacy procedure is done to achieve better usability. On the other hand, the goal of using cluster analysis with k-means was to analyze how the unlabeled original network trace data would perform in comparison to the anonymized data. Furthermore, the Davis- Bouldin criterion shows a value of 0.522, as depicted in Table II, indicating how well the clustering algorithm (k-means) performed with the original time-stamp (start and end times) data. In Figure 4, clustering results (with k=5 for the k-means) for the anonymized data are presented, with the x-axis showing the start time and the y-axis presenting the end time. Figure 5: K-means Cluster performance showing the average distance within centroid and items in each cluster The Davis-Bouldin criterion for the clustering performance on the anonymized data was 0.393 as shown in Table II, a value lower than that of the original data, and an indication of better clustering. However, while an anecdotal view of the plots shows that the cluster results look similar, the number of items in each cluster in the anonymized data differ from that of the original, as shown in Figure 5. For instance, in Figure 5, the number of items in cluster 0 for the original data is at 310678, while that of the anonymized data is at 291002. The trade-off would be the difference of 19676 items. The challenge still remains as to effectively balance anonymity and usability requirements, with trade-offs. In this case, if the usability threshold is not met, then the curator can fine-tune the anonymization parameters. The average-within-centroid distance returned a lower value for the anonymized data at 77865, and for the original data at 157093, with the lower value indicating better clustering, as shown in Figure 5. B. Source and destination IP address anonymity results The IP address remains a challenging attribute to anonymize due to the finite nature of the IP addresses. Each octet is limited to a range of 0 to 255 and obfuscation becomes constricted to that range. As we hinted earlier, it would not make any meaning to have octet values ranging between 270 and 450, for instance. In this section we present preliminary results on the anonymization and usability of the source and destination IP attribute values using the heuristics in section 3. Correlation: The correlation between the original and anonymized data, as shown in Table III, for the first octet of
  • 9. 9 the source and destination IP show values of 0.9 and 1 respectively. These strong correlation values are indicative of a strong linear relationship between the original and anonymized octet 1 data. The first octet of the IP address was anonymized using noise addition and generalization to keep the flow structure similar to the original. Since a partial prefix preserving anonymization was used, it is noteworthy that there are strong correlation values between the original and anonymized data for the first octet IP values. TABLE III. STATISTICAL TRAITS OF ORIGINAL AND ANONYMIZED SOURCE AND DESTINATION IP ADDRESSES Our view is that a researcher could still derive general network information from the flow structure presented by the first octet in the IP address without compromising the specifics of the other inner 3 octets. Yet the correlation between the anonymized data and original data for the 2, 3, and 4 octets show values of 0 for the destination IP addresses and minimal values of -0.081, 0.093, and 0.213, for source IP addresses, indicating that there is very low relationship between the anonymized and original data for octets 2, 3, and 4. However, the very low correlation values might be a good indicator for stronger privacy, since we employed differential privacy in the anonymization of octets 2, 3 and 4. Therefore the correlation between the anonymized and original data would be nonexistent or at least very minimal due to the differential privacy randomization. Hence the partial prefix- preserving heuristic works in this case, the user of the anonymized data is only able to derive information from the first octet while all other internal IP address information is kept confidential. Entropy: The Shannon Entropy test was done on both the original and anonymized data IP addresses to study the uncertainty and randomness in the data sets. The normalized Shannon's entropy values range between 0 and 1, with 0 indicating certainty and 1 indicating uncertainty. As shown in both Table III and Figure 6, the entropy values for octet 1 in both the original and anonymized data, is approximately at 0.1, indicative of certainty of values and thus maintenance of flow in the first octet. However, for octets 3 and 4, there is much less certainty in the original data and in octets 2, 3, and 4 for the anonymized data, though much lower than the original. Nevertheless, octet 2 in the original data provides more certainty than octet 2 in the anonymized data. While the entropy levels in octet 3 and 4 in the original data seem higher than that of the anonymized data, overall, octets 2, 3, and 4 in the anonymized data, provide more distributed uncertainty, better randomness, and thus better anonymity. Yet still, we constrained the random values in octet 2, 3, and 4 generated during the differential privacy procedure not to exceed 255. An octet value of 355 or 400 would affect the usability of the anonymized IP address data. However, it could be argued that the certainty levels are maintained in octet 1 for both original and anonymized data, with distortion on octet 2, 3, and 4 in the anonymized data, indicating that the flow structure is kept, and thus partial prefix-preserving anonymity might be achieved. Figure 6: Normalized Shannon's Entropy values for the original and anonymized IP addresses. Frequency Distribution histogram analysis: Furthermore, we did a frequency analysis to compare the distribution of values in each octet in the IP address, for both the original and anonymized IP addresses. For the original data the number of items in octet 1 between 40 and 45, that is, source IP addresses that start with octet values 40 to 45, came to approximately 400,000 out of 500,000 records, as shown in Figure 7. Similar results were actualized for the destination IP address, for octet 1 with about 300,000 items with values 40 to 45, as illustrated in Figure 8. With the exception of octet 2, the values in octet 3 and 4 are distributed across the range 0 to 85 in the original IP address data; this correlates with results shown in Figure 6, with higher entropy values for octet 3 and 4 in the original
  • 10. 10 data, indicating more uncertainty. The x-axis in each graph represents the IP octet values, and the y-axis, shows the frequency of each of those octet values. However, a look at the anonymized IP address data shows that octet 1 had about 390,000 IP address octet 1 values beginning with 200, as shown in Figure 9 and 10, for both source and destination IP address data respectively. The results show the effect of generalization used in the obfuscation of the original data for octet 1. The values in octet 2, 3 and 4 were distributed across the 0 to 255 range, with the highest concentration around octet value 190 due to the constraints placed on the differential privacy results, to prevent a return of value greater than 255. It would not make much meaning, as mentioned earlier, to have differential privacy results that exceed 255. For octet 2, 3, and 4, the Laplace distribution is kept due to the noise distribution used in the differential privacy process. Figure 7: Frequency distribution for the original source IP octet values. Figure 8: Frequency distribution for the original destination IP octet values Our recommendation as a result of this study is that a privacy engineering approach be highly considered by curators during the anonymization process. V. CONCLUSION Anonymizing network traces while maintaining an acceptable level of usability remains a challenge, especially when employing privatization techniques used for microdata obfuscation. Moreover, obfuscating network traces remains problematic due to the IP addresses and octet values being finite. Furthermore, generalized anonymization approaches fail to deliver specific solutions, as each entity will have unique data privacy and usability requirements, and the data in most cases have varying characteristics to be considered during the obfuscation process. In this study, we have provided a review of literature, pointing out some of the ongoing challenges in the network trace anonymization over the last 10-year period. We have suggested usability-aware anonymization heuristics by employing microdata privacy techniques, while taking into consideration the usability of the anonymized network trace data. Our preliminary results show that with trade-offs, it might be possible to generate anonymized network traces on a case-by-case basis, using micro-data anonymization techniques, such as differential privacy, k-anonymity, generalization, multiplicative noise addition. Figure 9: Frequency distribution for anonymized source IP octet values In the initial stage of the privacy engineering process, the curators could gather privacy and usability requirements from the stakeholders involved, this would include both the policy makers and anticipated users (researchers) of the anonymized network trace data. The curators could then model the most applicable approach given trade-offs, on a case-by-case basis. The generated anonymization model could then be implemented across the enterprise for uniformity and prevention of information leakage attacks. On the limitations of this study, focus was placed on usability-aware
  • 11. 11 anonymization of network trace data and not on the types of attacks on anonymized network traces. While some consideration and mention of anonymization attacks was given in this study, focusing on de-anonymization attacks was beyond the scope of this study, and a subject left for future work. Figure 10: Frequency distribution for anonymized destination IP octet values ACKNOWLEDGMENT We would like to express our appreciation to the Los Alamos National Laboratory, and more specifically, the Advanced Computing Solutions Group, for making this work possible. REFERENCES [1] D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjálmtýsson, A. Greenberg, and J. Rexford, “Structure preserving anonymization of router configuration data”, In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement (IMC '04), 2004, Pages 239-244. [2] A. Slagell, J. Wang, and W. Yurcik, "Network log anonymization: Application of crypto-pan to cisco netflows." In Proceedings of the Workshop on Secure Knowledge Management , 2004. [3] M. Bishop, R. Crawford, B. Bhumiratana, L. Clark, and K. Levitt, "Some problems in sanitizing network data.", 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 2006., pp. 307-312. [4] S.E. Coull, C.V. Wright, F. Monrose, M.P. Collins, and M.K. Reiter, "Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces." In NDSS, 2007, vol. 7, pp. 35-47. [5] B.F. Ribeiro, W. Chen, G. Miklau, and D.F. Towsley, "Analyzing Privacy in Enterprise Packet Trace Anonymization." In NDSS, 2008. [6] S. Gattani and T.E. Daniels, “Reference models for network data anonymization”, In Proceedings of the 1st ACM workshop on Network data anonymization (NDA '08), 2008, pp. 41-48. [7] J. King, K. Lakkaraju, and A. Slagell. "A taxonomy and adversarial model for attacks against network log anonymization." In Proceedings of the 2009 ACM symposium on Applied Computing, 2009, pp. 1286- 1293. [8] S.E. Coull, F. Monrose, M.K. Reiter, M. Bailey, "The Challenges of Effectively Anonymizing Network Data," Conference For Homeland Security, CATCH 2009, pp.230-236. [9] M. Foukarakis, D. Antoniades, and M. Polychronakis, “Deep packet anonymization”, In Proceedings of the Second European Workshop on System Security (EUROSEC '09). ACM, 2009, pp. 16-21. [10] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller, "An overview of IP flow-based intrusion detection." Communications Surveys & Tutorials, IEEE 12, no. 3, 2010, pp. 343-356. [11] M. Burkhart, D. Schatzmann, B. Trammell, E. Boschi, and B. Plattner. "The role of network trace anonymization under attack.", ACM SIGCOMM Computer Communication Review 40, no. 1, 2010, pp. 5- 11. [12] F. McSherry, and R. Mahajan, "Differentially-private network trace analysis.", ACM SIGCOMM Computer Communication Review 41.4, 2011, pp. 123-134. [13] F. McSherry, and R. Mahajan., "Differentially-private network trace analysis.", ACM SIGCOMM Computer Communication Review 41, no. 4, 2011, pp. 123-134. [14] R.R. Paul, V.C. Valgenti, M. Kim, "Real-time Netshuffle: Graph distortion for on-line anonymization," Network Protocols (ICNP), 19th IEEE International Conference on, 2011, pp.133,134. [15] D. Riboni, A. Villani, D. Vitali, C. Bettini, L.V. Mancini, "Obfuscation of sensitive data in network flows," INFOCOM, 2012 Proceedings, IEEE, 2012, pp.2372-2380. [16] W. Qardaji and L. Ninghui, "Anonymizing Network Traces with Temporal Pseudonym Consistency." IEEE 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW), 2012, pp. 622-633. [17] M. Mendonca, S. Seetharaman, and K. Obraczka, "A flexible in-network ip anonymization service.", In Communications (ICC), 2012 IEEE International Conference, pp. 6651-6656. [18] S. Jeon, J-H. Yun, and W-N. Kim, “Obfuscation of Critical Infrastructure Network Traffic using Fake Communication”, Annual Computer Security Applications Conference (ACSAC) 2013, Poster. [19] M. Nassar, B. al Bouna, and Q. Malluhi, "Secure Outsourcing of Network Flow Data Analysis.", In Big Data (BigData Congress), 2013 IEEE International Congress, 2013, pp. 431-432. [20] C. Xiaoyun, S. Yujie, T. Xiaosheng, H. Xiaohong, and M. Yan, "On measuring the privacy of anonymized data in multiparty network data sharing.", Communications, China 10, no. 5, 2013, pp. 120-127. [21] Y-D. Ying-Dar, P-C. Lin, S-H. Wang, I-W. Chen, and Y-C. Lai. "Pcaplib: A system of extracting, classifying, and anonymizing real packet traces.", IEEE Systems Journal, Issue 99, pp.1-12. [22] J. Stanek, L. Kencl, and J. Kuthan, "Analyzing anomalies in anonymized SIP traffic.", In IEEE 2014 IFIP Networking Conference, 2014, 2014, pp. 1-9. [23] D. Riboni, A. Villani, D. Vitali, C. Bettini, L.V. Mancini, L.V, "Obfuscation of Sensitive Data for Incremental Release of Network Flows," IEEE Transactions on Networking, Issue 99, 2014, pp.1. [24] T. Farah, and L. Trajkovic, "Anonym: A tool for anonymization of the Internet traffic." In IEEE 2013 International Conference on Cybernetics (CYBCONF), 2013, pp. 261-266. [25] A.J. Slagell, K. Lakkaraju, and K. Luo, "FLAIM: A Multi-level Anonymization Framework for Computer and Network Logs." In LISA, vol. 6, 2006, pp. 3-8. [26] J. Xu, J. Fan, M.H. Ammar, and Sue B. Moon, "Prefix-preserving ip address anonymization: Measurement-based security evaluation and a new cryptography-based scheme.", In 10th IEEE International Conference on Network Protocols, 2002, pp. 280-289. [27] M. Burkhart, D. Brauckhoff, M. May, and E. Boschi, "The risk-utility tradeoff for IP address truncation." In Proceedings of the 1st ACM workshop on Network data anonymization, 2008, pp. 23-30. [28] W. Yurcik, C. Woolam, G. Hellings, L. Khan, B. Thuraisingham, "Measuring anonymization privacy/analysis tradeoffs inherent to sharing network data", IEEE Network Operations and Management Symposium, 2008, pp.991-994. [29] V. Ciriani, S.D.C. Vimercati, S. Foresti, and P. Samarati, “Theory of privacy and Anonymity”, In M. J. Atallah & M. Blanton (Eds.), In Algorithms and theory of computation handbook, CRC Press, 2009, pp.
  • 12. 12 18-33. [30] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression”, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory, 1998 [31] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression”, International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 10(5), 2002, pp.571–588. [32] T. Dalenius and S.P. Reiss, “Data-swapping: A technique for disclosure control”, Journal of Statistical Planning and Inference, 6(1), 1982, pp. 73–85. [33] J. Kim, “A Method For Limiting Disclosure in Microdata Based Random Noise and Transformation”, In Proceedings of the Survey Research Methods, American Statistical Association, Vol. A, 1986, pp. 370–374. [34] J. Kim and W.E. Winkler, “Multiplicative Noise for Masking Continuous Data”, Research Report Series, Statistics #2003-01, Statistical Research Division. 2003, Washington, D.C. Retrieved from http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf [35] C. Dwork, “Differential Privacy”, In M. Bugliesi, B. Preneel, V. Sassone, & I. Wegener (Eds.), Automata languages and programming, Vol. 4052, 2006, pp. 1–12. Springer. [36] K. Mivule, “An Investigation Of Data Privacy and utility using machine learning as a gauge”, Dissertation, Computer Science Department, Bowie State University, 2014, ProQuest No: 3619387. [37] M.H. Dunham, “Data Mining Introductory and Advanced Topics”, 2003, pp. 58–60, 97–99. Upper Saddle River, New Jersey: Prentice Hall. [38] K. Mivule, (2012). “Utilizing noise addition for data privacy, an Overview”, In Proceedings of the International Conference on Information and Knowledge Engineering (IKE), 2012, pp. 65–71. [39] S.E. Coull, C.V. Wright, A.D. Keromytis, F. Monrose, and M.K. Reiter, “Taming the devil: Techniques for evaluating anonymized network data”, In Network and Distributed System Security Symposium, 2008, pp. 125-135.