SlideShare a Scribd company logo
1 of 10
Hadoop
and Big Data Security
Kevin T. Smith, 11/14/2013
Ksmith <AT> Novetta . COM
Big Data Security – Why Should We Care?
New Challenges related to Data Management, Security, and Privacy
As data growth is explosive, so is the complexity of our IT environments
Many organizations required to enforce access control & privacy restrictions on
data sets (HIPAA, Privacy Laws) – or face steep penalties & fines
Organizations are increasingly required to enforce access control to their data
scientists based on Need-to-Know, User Authorization levels, and what data they
are allowed to see – especially in Healthcare, Finance, and Government
Organizations struggling to understand what data they can release

Mismanagement of Data Sets -- Costly..
AOL Research “Data Valdez” Incident
• CNNMoney - “101 Dumbest Moments in Business”
• $5 Million Settlement , plus $100 to each member of AOL between 3/2006-5/2006,
+ $50 to each member who believed their data was in the released data; Fired
employees, CTO Resignation

The Netflix Contest Anonymized Data Set Incident
• Class-Action Lawsuit, $9 Million Settlement

Massachusetts Hospital Record Incident

Cyber Security Attacks are on the Rise
Ponemon Institute – the Average Cost of a Data Breach in the U.S. is 5.4 Million
dollars*
Playstation (2011) – Experts predict costs between 2.2 and 2.4 Billion
* (Breach Study: Global Analysis, May 2013)
A (Brief) History of Hadoop Security
Hadoop developed without Security in Mind
Originally No Security model
No authentication of users or services
Anyone could submit arbitrary code to be executed
Later authorization added, but any user could
impersonate other users with command-line switch

In 2009, Yahoo! focused on Hadoop
Authentication, and did a Hadoop Redesign, But…
Resulting Security Model is Complex
Security Configuration is complex & Easy to Mess Up
No Data at Rest Encryption
Kerberos-Centric
Limited Authorization Capabilities

Things are Changing, But Slowly..

It is important to
understand how
Hadoop Security is
Currently Implemented
& Configured
It is important to
understand how to
meet your
organization’s security
requirements
Hadoop Security Data Flow
Distributed Security
is a Challenge
Since the .20.20x
distributions of
Hadoop, much of
the model is
Kerberos Centric ,
as you see to the
right
Model is quite
complex, as you will
see on the next
slide
Token Delegation &
Hadoop Security Flow
Token

Used For

Kerberos TGT

Kerberos initial authentication to KDC.

Kerberos service
ticket

Kerberos initial authentication between users,
client processes, and services.

Delegation
token

Token issued by the NameNode to the client,
used by the client or any services working on
the client’s behalf to authenticate them to the
NameNode.

Block Access
token

Token issued by the NameNode after
validating authorization to a particular block
of data, based on a shared secret with the
DataNode. Clients (and services working on
the client’s behalf) use the Block Access
token to request blocks from the DataNode.

Job token

This is issued by the JobTracker to
TaskTrackers. Tasks communicating with
TaskTrackers for a particular job use this
token to prove they are associated with the
job.
Some Vendor Activity in Hadoop Security
Seems to be a New One Every Week!

Cloudera Sentry – Fine Grained Access Control for Apache Hive & Cloudera Impala
IBM InfoSphere Optim Data Masking – Optim Data Masking provides “Deidentification” of data by obfuscating corporate secrets, Guardium provides
monitoring & auditing
Intel’s Secure Hadoop Distribution – Encryption in transit & at rest, Granular access
control with HBase
DataStax Enterprise – Encryption in Transit & at Rest (using Cassandra for storage)
DataGuise for Hadoop – Detects & protects sensitive data, setting access
permission, masking or encrypting data, authorization based access
Knox Gateway (Hortonworks) – Perimeter security, integration with IDAM
environments, manage security across multiple clusters – now an Apache Project
Protegrity – Big Data Protector provides Encryption & tokenization, Enterprise
Security Administrator provides central policy, key mgmt, auditing, reporting
Sqrrl – Builds on Apache Accumulo’s security capabilities for Hadoop
Zettaset Secure Orchestrator – security wrapper around Hadoop
Apache Accumulo
• Cell-Level Access Control via visibility
• By default, uses its own db for
users & credentials
• Can be extended in code to use other
Identity & Access Management
Infrastructure
Project Rhino
Intel launched this open source effort to improve security capabilities of Hadoop &
contributed code to Apache in early 2013.
Encrypted Data at Rest - JIRA Tasks HADOOP-9331 (Hadoop Crypto Codec
Framework and Crypto Codec Implementation) and MAPREDUCE-5025 (Key
Distribution and Management for Supporting Crypto Codec in MapReduce) .
ZOOKEEPER-1688 will provide the ability for transparent encryption of snapshots
and commit logs on disk, protecting against the leakage of sensitive information
from files at rest.
Token-Based Authentication & Unified Authorization Framework - JIRA
TasksHADOOP-9392 (Token-Based Authentication and Single Sign-On) and HADOOP9466(Unified Authorization Framework)
Improved Security in HBase - The JIRA Task HBASE-6222 (Add Per-KeyValue
Security) adds cell-level authorization to HBase – something that Apache Accumulo
has but HBase does not. HBASE-7544 builds on the encryption framework being
developed, extending it to HBase, providing transparent table encryption.
What’s the Best Guidance Now?
Identify and Understand the Sensitivity Levels of Your Data
Are there access control policies associated with your data?

Understand the Impact of the Release of Your Data
Netflix example – Could someone couple your data with open source data to
gain new (and unintended) insight?

Develop Policies & Procedures relating to Security & Privacy of Your Data
Sets
Data Ingest
Access Control within Your Organization
Cleansing/Sanitization/Destruction
Auditing
Monitoring Procedures
Incident Response

Develop a Technical Security Approach that Complements Hadoop Security
Questions?
Ksmith <AT> Novetta.COM

More Related Content

What's hot

Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAmazon Web Services
 
Data Governance and Management in Cloud pak nam
Data Governance and Management in Cloud pak namData Governance and Management in Cloud pak nam
Data Governance and Management in Cloud pak namPT Datacomm Diangraha
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Jason Trost
 
Modern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersModern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersJason Trost
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...DataWorks Summit/Hadoop Summit
 
GTB Data Leakage Prevention Use Cases 2014
GTB Data Leakage Prevention Use Cases 2014GTB Data Leakage Prevention Use Cases 2014
GTB Data Leakage Prevention Use Cases 2014Ravindran Vasu
 
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...IRJET Journal
 
Web Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceWeb Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceGerardo Pardo-Castellote
 
Anomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceAnomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceJason Trost
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceJason Trost
 
IRJET- Multi-Owner Keyword Search over Cloud with Cryptography
IRJET- Multi-Owner Keyword Search over Cloud with CryptographyIRJET- Multi-Owner Keyword Search over Cloud with Cryptography
IRJET- Multi-Owner Keyword Search over Cloud with CryptographyIRJET Journal
 
Accelerated Migrations with Nuix
Accelerated Migrations with NuixAccelerated Migrations with Nuix
Accelerated Migrations with NuixCarey Bandler
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
 
Data Driven Security in SSAS
Data Driven Security in SSASData Driven Security in SSAS
Data Driven Security in SSASMike Duffy
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 

What's hot (19)

Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
 
Data Governance and Management in Cloud pak nam
Data Governance and Management in Cloud pak namData Governance and Management in Cloud pak nam
Data Governance and Management in Cloud pak nam
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
 
Modern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersModern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security Hackers
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
GTB Data Leakage Prevention Use Cases 2014
GTB Data Leakage Prevention Use Cases 2014GTB Data Leakage Prevention Use Cases 2014
GTB Data Leakage Prevention Use Cases 2014
 
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
 
Web Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceWeb Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS Conference
 
Anomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceAnomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat Intelligence
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat Intelligence
 
IRJET- Multi-Owner Keyword Search over Cloud with Cryptography
IRJET- Multi-Owner Keyword Search over Cloud with CryptographyIRJET- Multi-Owner Keyword Search over Cloud with Cryptography
IRJET- Multi-Owner Keyword Search over Cloud with Cryptography
 
Accelerated Migrations with Nuix
Accelerated Migrations with NuixAccelerated Migrations with Nuix
Accelerated Migrations with Nuix
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Data Driven Security in SSAS
Data Driven Security in SSASData Driven Security in SSAS
Data Driven Security in SSAS
 
Data Leakage Prevention
Data Leakage PreventionData Leakage Prevention
Data Leakage Prevention
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 

Viewers also liked

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
2008-01-22 Red Hat (Security) Roadmap Presentation
2008-01-22 Red Hat (Security) Roadmap Presentation2008-01-22 Red Hat (Security) Roadmap Presentation
2008-01-22 Red Hat (Security) Roadmap PresentationShawn Wells
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Big Data Security with HP ArcSight
Big Data Security with HP ArcSightBig Data Security with HP ArcSight
Big Data Security with HP ArcSightSridhar Karnam
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
LPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to executeLPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to executeTelefónica IoT
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Coastal Pet Products, Inc.
 

Viewers also liked (20)

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
2008-01-22 Red Hat (Security) Roadmap Presentation
2008-01-22 Red Hat (Security) Roadmap Presentation2008-01-22 Red Hat (Security) Roadmap Presentation
2008-01-22 Red Hat (Security) Roadmap Presentation
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Big security for big data
Big security for big dataBig security for big data
Big security for big data
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Big Data Security with HP ArcSight
Big Data Security with HP ArcSightBig Data Security with HP ArcSight
Big Data Security with HP ArcSight
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
LPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to executeLPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to execute
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 

Similar to Hadoop and Big Data Security

Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Denodo
 
A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...IJARIIT
 
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageA Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageIRJET Journal
 
Zero Trust 20211105
Zero Trust 20211105 Zero Trust 20211105
Zero Trust 20211105 Thomas Treml
 
Achieving Secure, sclable and finegrained Cloud computing report
Achieving Secure, sclable and finegrained Cloud computing reportAchieving Secure, sclable and finegrained Cloud computing report
Achieving Secure, sclable and finegrained Cloud computing reportKiran Girase
 
IRJET- A Novel and Secure Approach to Control and Access Data in Cloud St...
IRJET-  	  A Novel and Secure Approach to Control and Access Data in Cloud St...IRJET-  	  A Novel and Secure Approach to Control and Access Data in Cloud St...
IRJET- A Novel and Secure Approach to Control and Access Data in Cloud St...IRJET Journal
 
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)Denodo
 
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public Cloud
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public CloudProxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public Cloud
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public CloudIRJET Journal
 
Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...LeMeniz Infotech
 
A New Mode to Ensure Security in Cloud Computing Services
A New Mode to Ensure Security in Cloud Computing ServicesA New Mode to Ensure Security in Cloud Computing Services
A New Mode to Ensure Security in Cloud Computing ServicesMahmuda Rahman
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissanceCloudera, Inc.
 
Implementing an improved security for collin’s database and telecommuters
Implementing an improved security for collin’s database and telecommutersImplementing an improved security for collin’s database and telecommuters
Implementing an improved security for collin’s database and telecommutersRishabh Gupta
 
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...Editor IJCATR
 
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...IRJET Journal
 
0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).ppt0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).pptVarioTechnology
 
Compliance in the Cloud
Compliance in the CloudCompliance in the Cloud
Compliance in the CloudRapidScale
 

Similar to Hadoop and Big Data Security (20)

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
 
A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...
 
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageA Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
 
Zero Trust 20211105
Zero Trust 20211105 Zero Trust 20211105
Zero Trust 20211105
 
Achieving Secure, sclable and finegrained Cloud computing report
Achieving Secure, sclable and finegrained Cloud computing reportAchieving Secure, sclable and finegrained Cloud computing report
Achieving Secure, sclable and finegrained Cloud computing report
 
IRJET- A Novel and Secure Approach to Control and Access Data in Cloud St...
IRJET-  	  A Novel and Secure Approach to Control and Access Data in Cloud St...IRJET-  	  A Novel and Secure Approach to Control and Access Data in Cloud St...
IRJET- A Novel and Secure Approach to Control and Access Data in Cloud St...
 
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
 
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public Cloud
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public CloudProxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public Cloud
Proxy-Oriented Data Uploading & Monitoring Remote Data Integrity in Public Cloud
 
Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...
 
A New Mode to Ensure Security in Cloud Computing Services
A New Mode to Ensure Security in Cloud Computing ServicesA New Mode to Ensure Security in Cloud Computing Services
A New Mode to Ensure Security in Cloud Computing Services
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Implementing an improved security for collin’s database and telecommuters
Implementing an improved security for collin’s database and telecommutersImplementing an improved security for collin’s database and telecommuters
Implementing an improved security for collin’s database and telecommuters
 
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...
A Secure, Scalable, Flexible and Fine-Grained Access Control Using Hierarchic...
 
Rik Ferguson
Rik FergusonRik Ferguson
Rik Ferguson
 
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
 
0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).ppt0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).ppt
 
Compliance in the Cloud
Compliance in the CloudCompliance in the Cloud
Compliance in the Cloud
 
SAP HANA Cloud Security
SAP HANA Cloud SecuritySAP HANA Cloud Security
SAP HANA Cloud Security
 

More from Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

More from Chicago Hadoop Users Group (19)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Recently uploaded

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Hadoop and Big Data Security

  • 1. Hadoop and Big Data Security Kevin T. Smith, 11/14/2013 Ksmith <AT> Novetta . COM
  • 2. Big Data Security – Why Should We Care? New Challenges related to Data Management, Security, and Privacy As data growth is explosive, so is the complexity of our IT environments Many organizations required to enforce access control & privacy restrictions on data sets (HIPAA, Privacy Laws) – or face steep penalties & fines Organizations are increasingly required to enforce access control to their data scientists based on Need-to-Know, User Authorization levels, and what data they are allowed to see – especially in Healthcare, Finance, and Government Organizations struggling to understand what data they can release Mismanagement of Data Sets -- Costly.. AOL Research “Data Valdez” Incident • CNNMoney - “101 Dumbest Moments in Business” • $5 Million Settlement , plus $100 to each member of AOL between 3/2006-5/2006, + $50 to each member who believed their data was in the released data; Fired employees, CTO Resignation The Netflix Contest Anonymized Data Set Incident • Class-Action Lawsuit, $9 Million Settlement Massachusetts Hospital Record Incident Cyber Security Attacks are on the Rise Ponemon Institute – the Average Cost of a Data Breach in the U.S. is 5.4 Million dollars* Playstation (2011) – Experts predict costs between 2.2 and 2.4 Billion * (Breach Study: Global Analysis, May 2013)
  • 3. A (Brief) History of Hadoop Security Hadoop developed without Security in Mind Originally No Security model No authentication of users or services Anyone could submit arbitrary code to be executed Later authorization added, but any user could impersonate other users with command-line switch In 2009, Yahoo! focused on Hadoop Authentication, and did a Hadoop Redesign, But… Resulting Security Model is Complex Security Configuration is complex & Easy to Mess Up No Data at Rest Encryption Kerberos-Centric Limited Authorization Capabilities Things are Changing, But Slowly.. It is important to understand how Hadoop Security is Currently Implemented & Configured It is important to understand how to meet your organization’s security requirements
  • 4. Hadoop Security Data Flow Distributed Security is a Challenge Since the .20.20x distributions of Hadoop, much of the model is Kerberos Centric , as you see to the right Model is quite complex, as you will see on the next slide
  • 5. Token Delegation & Hadoop Security Flow Token Used For Kerberos TGT Kerberos initial authentication to KDC. Kerberos service ticket Kerberos initial authentication between users, client processes, and services. Delegation token Token issued by the NameNode to the client, used by the client or any services working on the client’s behalf to authenticate them to the NameNode. Block Access token Token issued by the NameNode after validating authorization to a particular block of data, based on a shared secret with the DataNode. Clients (and services working on the client’s behalf) use the Block Access token to request blocks from the DataNode. Job token This is issued by the JobTracker to TaskTrackers. Tasks communicating with TaskTrackers for a particular job use this token to prove they are associated with the job.
  • 6. Some Vendor Activity in Hadoop Security Seems to be a New One Every Week! Cloudera Sentry – Fine Grained Access Control for Apache Hive & Cloudera Impala IBM InfoSphere Optim Data Masking – Optim Data Masking provides “Deidentification” of data by obfuscating corporate secrets, Guardium provides monitoring & auditing Intel’s Secure Hadoop Distribution – Encryption in transit & at rest, Granular access control with HBase DataStax Enterprise – Encryption in Transit & at Rest (using Cassandra for storage) DataGuise for Hadoop – Detects & protects sensitive data, setting access permission, masking or encrypting data, authorization based access Knox Gateway (Hortonworks) – Perimeter security, integration with IDAM environments, manage security across multiple clusters – now an Apache Project Protegrity – Big Data Protector provides Encryption & tokenization, Enterprise Security Administrator provides central policy, key mgmt, auditing, reporting Sqrrl – Builds on Apache Accumulo’s security capabilities for Hadoop Zettaset Secure Orchestrator – security wrapper around Hadoop
  • 7. Apache Accumulo • Cell-Level Access Control via visibility • By default, uses its own db for users & credentials • Can be extended in code to use other Identity & Access Management Infrastructure
  • 8. Project Rhino Intel launched this open source effort to improve security capabilities of Hadoop & contributed code to Apache in early 2013. Encrypted Data at Rest - JIRA Tasks HADOOP-9331 (Hadoop Crypto Codec Framework and Crypto Codec Implementation) and MAPREDUCE-5025 (Key Distribution and Management for Supporting Crypto Codec in MapReduce) . ZOOKEEPER-1688 will provide the ability for transparent encryption of snapshots and commit logs on disk, protecting against the leakage of sensitive information from files at rest. Token-Based Authentication & Unified Authorization Framework - JIRA TasksHADOOP-9392 (Token-Based Authentication and Single Sign-On) and HADOOP9466(Unified Authorization Framework) Improved Security in HBase - The JIRA Task HBASE-6222 (Add Per-KeyValue Security) adds cell-level authorization to HBase – something that Apache Accumulo has but HBase does not. HBASE-7544 builds on the encryption framework being developed, extending it to HBase, providing transparent table encryption.
  • 9. What’s the Best Guidance Now? Identify and Understand the Sensitivity Levels of Your Data Are there access control policies associated with your data? Understand the Impact of the Release of Your Data Netflix example – Could someone couple your data with open source data to gain new (and unintended) insight? Develop Policies & Procedures relating to Security & Privacy of Your Data Sets Data Ingest Access Control within Your Organization Cleansing/Sanitization/Destruction Auditing Monitoring Procedures Incident Response Develop a Technical Security Approach that Complements Hadoop Security