SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
© Cloudera, Inc. All rights reserved.
Cloudera training: secure your Cloudera cluster
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers
cannot afford to move slowly in staffing their Big Data projects.
Customers are building plans to ensure projects are staffed with
skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about
when it comes to your readiness for big
data and hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?
Aligned to best practices and the pace of change
1 Broadest range of courses
Learning paths for Developer, Admin, Analyst
2 Most experienced instructors
More than 40,000 trained since 2009
6 Widest geographic coverage
Most classes offered: 50 cities worldwide plus online
7 Most relevant platform & community
CDH deployed more than all other distributions combined
3 Leader in certification
Over 12,000 accredited Cloudera professionals
Trusted source for training
100,000+ people have attended online courses4
8 Depth of training material
Hands-on labs and VMs support live instruction
9 Ongoing learning
Video tutorials and e-learning complement training
State of the art curriculum
Courses updated as Hadoop evolves5 10Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
Creating leaders in the field
Training enables Big Data solutions and innovation
94%
66%
Would recommend or highly recommend Cloudera
training to friends or colleagues
Draw on lessons from Cloudera training on at least a
monthly basis
40% Develop new apps or perform business-critical
analyses as a result of training alone
Sources: Cloudera Past Public Training Participant Study, December 2012.
Cloudera Customer Satisfaction Study, January 2013.
88% Indicate Cloudera training provided the Hadoop
expertise their roles require
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
• Cloudera Security OnDemand
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
7© Cloudera, Inc. All rights reserved.
Security for Hadoop
Carlo Lazzaris | Technical Instructor
8© Cloudera, Inc. All rights reserved.
Security Webinar Agenda
1. The need for Hadoop Security
Hacker news and legal regulations
2. Cloudera Security Implementation
Five levels of security
3. How to secure your Cloudera cluster
Cloudera Documentation
Cloudera professional services
Cloudera OnDemand security course
9© Cloudera, Inc. All rights reserved.
The need for Hadoop security
10© Cloudera, Inc. All rights reserved.
Unguarded data stores are the victims
11© Cloudera, Inc. All rights reserved.
Regulatory Compliance
Organizations can be fined up to 4% of
annual global turnover for breaching GDPR
or €20 Million
12© Cloudera, Inc. All rights reserved.
Cloudera security implementation
13© Cloudera, Inc. All rights reserved.
Cloudera Enterprise CDH
13
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE
SERVICES
CORE SERVICES
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA CATALOG
INGEST &
REPLICATION
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
DATA
SCIENCE
S3 ADLS HDFS KUDU
STORAGE
SERVICES
14© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent
controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access
to all relevant data and increases compliance
• Easy workload management – increases user productivity and
boosts job predictability
• Flexible ingest and replication – aggregates a single copy of
all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and
business context of data for new applications and partner
solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
15© Cloudera, Inc. All rights reserved.
Cloudera Enterprise-Grade Security and Governance
Access
Defining what
users and
applications can
do with data
Technical Concepts:
Permissions
Authorization
Data
Protection
Shielding data in
the cluster from
unauthorized
visibility
Technical Concepts:
Encryption at rest & in
motion
Visibility
Reporting on
where data came
from and how it’s
being used
Technical Concepts:
Auditing
Lineage
Cloudera Manager Apache Sentry Cloudera Navigator
Navigator Encrypt &
Key Trustee
Identity
Validate users by
membership in
enterprise
directory
Technical
Concepts:
Authentication
User/group mapping
16© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology Partners
Data Sources Data Ingest
Process, Refine
& Prep
Data Discovery Advanced Analytics
Connected
Machines/Data sources
Other Data Sources
17© Cloudera, Inc. All rights reserved.
A certified product ensures it integrates securely
• Authenticate via Kerberos or LDAP
Authentication
• Handle Apache Sentry with Hive, Impala, Search, HDFS
Authorization
• Support HDFS transport encryption, at-rest encryption; support SSL/TLS
connection encryption
Encryption
18© Cloudera, Inc. All rights reserved.
Vulnerability Response and Process
Vulnerability
reports
Upstream
Internal
External
Fix Publish
19© Cloudera, Inc. All rights reserved.
Cluster Security Levels
20© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
20
The modern platform for machine learning and analytics optimized for the cloud
21© Cloudera, Inc. All rights reserved.
Enterprise Encryption Performance
23© Cloudera, Inc. All rights reserved.
Disclaimer
This talk serves as a general guideline for
security implementation on Hadoop.
The actual implementation procedures and
scope of implementation vary on a case-by-
case basis, and should be assessed by
Cloudera’s Professional Services team or
certified Cloudera SI Partners.
24© Cloudera, Inc. All rights reserved.
Non-secure #0
Data Free for All
25© Cloudera, Inc. All rights reserved.
Firewall
ActiveDirectory/KDC
Hadoop cluster
Cloudera
Manager
Gateway
node
Cloudera Worker
nodesDatacenter
Applications
26© Cloudera, Inc. All rights reserved.
4 modes of Identity Management
1. Simple Authentication
2. Kerberos
3. LDAP
4. SAML
File group ownership
• AD integration
• SSSD or Centrify
Consideration in large enterprises.
via SSSD
via
27© Cloudera, Inc. All rights reserved.
Simple Authentication detect the user
Firewall
ActiveDirectory
Master
Worker Worker Worker
Cloudera
Manager
Master
(SSSD/Centrify)
28© Cloudera, Inc. All rights reserved.
Simple authentication =
no authentication
29© Cloudera, Inc. All rights reserved.
Minimal Security #1
Reduce Risk Exposure
30© Cloudera, Inc. All rights reserved.
How it works: Authentication
• LDAP and SAML authentication
options
Web UIs
• LDAP/AD and Kerberos
authentication options
SQL Access
•Kerberos authentication
•Automation provided by Cloudera
Manager to leverage Active
Directory (AD)
Command Lines
User authenticates to
AD or KDC
Authenticated user
gets Kerberos Ticket
Ticket grants access to
Services e.g. Impala
User [ssmith]
Password [***** ]
31© Cloudera, Inc. All rights reserved.
Kerberos
EXAMPLE.COM
KDC
user@EXAMPLE.COM
Hadoop
user@EXAMPLE.COM 
user
Strong Authentication
KDC Key Distribution Center
• MIT
• ActiveDirectory (more common)
realmprimary
32© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
33© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
34© Cloudera, Inc. All rights reserved.
Kerberos Authentication
* LDAP over SSL
35© Cloudera, Inc. All rights reserved.
Authorization/Access Control
HDFS File ACL YARN job submission
Hbase ACLsOozie ACL
Access Control List (ACLs)
Hive
Sentry Managed
(RBAC)
Impala
36© Cloudera, Inc. All rights reserved.
Auditing
37© Cloudera, Inc. All rights reserved.
Backup/Disaster Recovery
Cloudera Backup/Disaster Recovery (BDR)
• A high performance data replicator
• Copies incremental data on the source cluster at specified schedules
Supports
 Kerberos
 Data encryption
 HDFS replication to cloud
38© Cloudera, Inc. All rights reserved.
Kerberized BDR Best Practice
Production DR
Cloudera BDR
PROD.EXAMPLE.COM
Cross-realm trust
KDC KDC
DR.EXAMPLE.COM
39© Cloudera, Inc. All rights reserved.
More Security #2
Managed, Secure, Protected
40© Cloudera, Inc. All rights reserved.
Data In-Motion Encryption
RPC encryption
Data transport encryption
• Supports AES CTR, up to 256-bit
key length
HTTP TLS/SSL encryption
• No self-signed certificates in
production
Master
Worker Worker Worker
Master
Application
RPC encryption
Transport
encryption
TLS/SSL
41© Cloudera, Inc. All rights reserved.
Data At-Rest Encryption
Transparent encryption
Supports any Hadoop applications
Encryption Zone
$ hadoop key create mykey
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName mykey -path /zone
/
/tmp /zone
foo bar
Encryption zone
42© Cloudera, Inc. All rights reserved.
Key Management Server Deployment (non-prod)
HDFS
NameNode
Client
Java
Keystore
KMS
Keystore file
Separation of duties
• Encryption Zone Key (EZK) is stored in
KMS server
• HDFS super user can not decrypt files
43© Cloudera, Inc. All rights reserved.
Key Management Server/Key Trustee Server Deployment
HDFS
NameNode
Client
Key Trustee
KMS
Key Trustee
KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
(or more)
44© Cloudera, Inc. All rights reserved.
KMS+KTS+HSM Deployment
HDFS
NameNode
Client HSM KMS
HSM KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
Key HSM
(or more)
Key HSM
HSM
HSM
45© Cloudera, Inc. All rights reserved.
Troubleshooting: Encryption Performance Anomaly
• Configuration
• AES-NI Hardware acceleration
• OpenSSL library
• Entropy
46© Cloudera, Inc. All rights reserved.
Fine Grained Access Control with Apache Sentry
47© Cloudera, Inc. All rights reserved.
Most Security #3
Secure Data Vault
48© Cloudera, Inc. All rights reserved.
Level 3 Secure Data Vault
• All data, both data-at-rest and data-in-transit is encrypted
• Key management system is fault-tolerant
• Auditing mechanisms comply with industry, government, and regulatory
standards (PCI, HIPAA, NIST, for example)
• Auditing extends from EDH to the other systems that integrate with it.
• Cluster administrators are well-trained
• Security procedures have been certified by an expert
• Cluster can pass technical review
49© Cloudera, Inc. All rights reserved.
Data Redaction
Personal Identifiable Information
• PCI-DSS, HIPAA
Best practices followed
Password
• stores in credential files, not in configuration
Log, queries
• Cloudera Manager
50© Cloudera, Inc. All rights reserved.
Full Encryption
Encrypt Data Spills
• MapReduce
• Impala
• Hive
• Flume
OS-level encryption
• Navigator Encrypt
51© Cloudera, Inc. All rights reserved.
How to secure your Cloudera cluster
52© Cloudera, Inc. All rights reserved.
Cloudera Documentation
53© Cloudera, Inc. All rights reserved.
Cloudera Professional Services security engagement
• Review security requirements and provide an overview of data security policies
• Audit architecture and current systems for security policies and best practices
• Custom tailor a security reference architecture
• Optimize OS and Java to take advantage of hardware-based crypto-acceleration
• Install and configure Kerberos with MIT Kerberos KDC or Active Directory
• Install and configure Sentry and Cloudera Navigator (license required)
• Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust
• Review fine-grain permissions on sample data using Sentry
• Review audit and lineage on sample data using Navigator
• Use Cloudera Manager and Hue to review security integration for users
• Enable and configure HDFS encryption
https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html
54© Cloudera, Inc. All rights reserved.
Cloudera online ondemand security course
• Online self paced training course https://ondemand.cloudera.com
• Launch planned for mid Feb 2018
• 3 days estimate worth of content at Cloudera level 1 and 2 security level
• Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations :
1. Security overview
2. Security Architecture
3. Host Security
4. Encrypting Data in motion
5. Authentication
6. Authorization
7. Encrypting Data at Rest
8. Auditing
9. Additional Considerations: Data Governance
55© Cloudera, Inc. All rights reserved.
Ondemand security course instructor guided demos
1. Potential Attack vectors
2. Securing the cluster hosts
3. Generating and managing keys for TLS
4. Configuring Cloudera Manager for TLS
5. Encrypting Data in Motion
6. Hadoop default authentication
7. Kerberizing Cluster with MIT Kerberos
8. Kerberizing Cluster with Active Directory
9. Configuring Authorising with Cloudera
Manager
10. Controlling access to Yarn
11. Controlling access to HDFS
12. Controlling access to Tables
13. Enabling HDFS Encryption
14. Protecting local data with NavEncrypt
15. Using Navigator for auditing
16. Reassessing cluster security
56© Cloudera, Inc. All rights reserved.
Ondemand security course disclaimer
THIS IS REALLY IMPORTANT:
The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a
cluster using the CentOS 7.2 operating system.
Given the almost limitless permutations of possible configurations, including different versions of CDH,
Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other
tools, as well as variations in policies, laws, and practices that affect each organization differently, it's
impossible for a training course to cover all aspects of security.
This course is meant to provide a background that will help you to understand many important concepts
and techniques, but is not intended as a replacement for the relevant documentation or a consulting
engagement with an expert who can provide advice based on your specific requirements.
• Disclaimers ~ due to security variety and permutations
• Versions used: CDH 5.12 and Centos 7.2
57© Cloudera, Inc. All rights reserved.
Ondemand security course scenario
• Many of our demonstrations are based on a hypothetical scenario
• However, the concepts should apply to nearly any organization
• Loudacre Mobile is a fast-growing wireless carrier
• Employees serving in a variety of roles
• Data ingested from many sources, in many formats
• Data processed by many tools
58© Cloudera, Inc. All rights reserved.
Ondemand security course environment
59© Cloudera, Inc. All rights reserved.
Comprehensive demonstration cluster
60© Cloudera, Inc. All rights reserved.
Sample chapter structure: Encrypting Data in Motion
• Encryption Fundamentals
• Certificates
• Key Management
 Instructor-Led Demonstration: Generating and Managing Keys for TLS
• Configuring Cloudera Manager for TLS
 Instructor-Led Demonstration: Configuring Cloudera Manager for TLS
• Encrypting Hadoop’s Data in Motion
 Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion
• Essential Points
61© Cloudera, Inc. All rights reserved.
Register your interest for
OnDemand security course:
peter.rizvi@cloudera.com
© Cloudera, Inc. All rights reserved.
Thank you

Contenu connexe

Tendances

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFSDataWorks Summit
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris Mohamed Mehdi Ben Aissa
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 

Tendances (20)

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 

Similaire à Cloudera training: secure your Cloudera cluster

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataCloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certificationCloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019 Timothy Spann
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 

Similaire à Cloudera training: secure your Cloudera cluster (20)

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certification
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 

Dernier

Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextP&CO
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfASGITConsulting
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersPeter Horsten
 
Fundamentals Welcome and Inclusive DEIB
Fundamentals Welcome and  Inclusive DEIBFundamentals Welcome and  Inclusive DEIB
Fundamentals Welcome and Inclusive DEIBGregory DeShields
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdfSherl Simon
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfJamesConcepcion7
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Aggregage
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxRakhi Bazaar
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesAurelien Domont, MBA
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterJamesConcepcion7
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifeBhavana Pujan Kendra
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxRich Reba
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...SOFTTECHHUB
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreNZSG
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers referencessuser2c065e
 

Dernier (20)

Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider context
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdf
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exporters
 
Fundamentals Welcome and Inclusive DEIB
Fundamentals Welcome and  Inclusive DEIBFundamentals Welcome and  Inclusive DEIB
Fundamentals Welcome and Inclusive DEIB
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdf
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and Templates
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare Newsletter
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in Life
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 
Toyota and Seven Parts Storage Techniques
Toyota and Seven Parts Storage TechniquesToyota and Seven Parts Storage Techniques
Toyota and Seven Parts Storage Techniques
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptx
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource Centre
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers reference
 

Cloudera training: secure your Cloudera cluster

  • 1. © Cloudera, Inc. All rights reserved. Cloudera training: secure your Cloudera cluster
  • 2. © Cloudera, Inc. All rights reserved. The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider. Job Trends from Indeed.com What are you most concerned about when it comes to your readiness for big data and hadoop? Cloudera MDP webinar poll results, July 2016
  • 3. © Cloudera, Inc. All rights reserved. Why Cloudera training? Aligned to best practices and the pace of change 1 Broadest range of courses Learning paths for Developer, Admin, Analyst 2 Most experienced instructors More than 40,000 trained since 2009 6 Widest geographic coverage Most classes offered: 50 cities worldwide plus online 7 Most relevant platform & community CDH deployed more than all other distributions combined 3 Leader in certification Over 12,000 accredited Cloudera professionals Trusted source for training 100,000+ people have attended online courses4 8 Depth of training material Hands-on labs and VMs support live instruction 9 Ongoing learning Video tutorials and e-learning complement training State of the art curriculum Courses updated as Hadoop evolves5 10Commitment to big data education University partnerships to teach Hadoop in colleges
  • 4. © Cloudera, Inc. All rights reserved. Creating leaders in the field Training enables Big Data solutions and innovation 94% 66% Would recommend or highly recommend Cloudera training to friends or colleagues Draw on lessons from Cloudera training on at least a monthly basis 40% Develop new apps or perform business-critical analyses as a result of training alone Sources: Cloudera Past Public Training Participant Study, December 2012. Cloudera Customer Satisfaction Study, January 2013. 88% Indicate Cloudera training provided the Hadoop expertise their roles require
  • 5. © Cloudera, Inc. All rights reserved. What is available from Cloudera University? • Private training: Course delivered at location of customer choice to internal audience • Public training: Courses regularly scheduled around the globe. Schedule available on web • Virtual training: Live training accessed via the internet; available for public and private courses • OnDemand training: Pre-recorded lecture with identical content/exercises as live training options • Certification: Rigorously developed and meaningful bodies of knowledge OnDemand Virtual live classroom Private onsitePublic live classroom
  • 6. © Cloudera, Inc. All rights reserved. Suggested Cloudera University curricula Developers • Python/Scala Training • Developer for Spark and Hadoop • CCA: Spark and Hadoop Developer • Spark ML & Kafka modules • Topic specific training (Search, HBase) • Hands on practice • CCP: Data Engineer Administrators • Cloudera Administration training • CCA: Administrator • Cloudera Security OnDemand Data Analysts/Data Scientists • Data Analyst: Using Hive, Pig & Impala • CCA: Data Analyst • Cloudera Data Science
  • 7. 7© Cloudera, Inc. All rights reserved. Security for Hadoop Carlo Lazzaris | Technical Instructor
  • 8. 8© Cloudera, Inc. All rights reserved. Security Webinar Agenda 1. The need for Hadoop Security Hacker news and legal regulations 2. Cloudera Security Implementation Five levels of security 3. How to secure your Cloudera cluster Cloudera Documentation Cloudera professional services Cloudera OnDemand security course
  • 9. 9© Cloudera, Inc. All rights reserved. The need for Hadoop security
  • 10. 10© Cloudera, Inc. All rights reserved. Unguarded data stores are the victims
  • 11. 11© Cloudera, Inc. All rights reserved. Regulatory Compliance Organizations can be fined up to 4% of annual global turnover for breaching GDPR or €20 Million
  • 12. 12© Cloudera, Inc. All rights reserved. Cloudera security implementation
  • 13. 13© Cloudera, Inc. All rights reserved. Cloudera Enterprise CDH 13 The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT DATA SCIENCE S3 ADLS HDFS KUDU STORAGE SERVICES
  • 14. 14© Cloudera, Inc. All rights reserved. • Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads • Consistent governance – enables secure self-service access to all relevant data and increases compliance • Easy workload management – increases user productivity and boosts job predictability • Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration • Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions Open platform services Built for multi-function analytics | Optimized for cloud
  • 15. 15© Cloudera, Inc. All rights reserved. Cloudera Enterprise-Grade Security and Governance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protection Shielding data in the cluster from unauthorized visibility Technical Concepts: Encryption at rest & in motion Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry Cloudera Navigator Navigator Encrypt & Key Trustee Identity Validate users by membership in enterprise directory Technical Concepts: Authentication User/group mapping
  • 16. 16© Cloudera, Inc. All rights reserved. Cloudera Certified Technology Partners Data Sources Data Ingest Process, Refine & Prep Data Discovery Advanced Analytics Connected Machines/Data sources Other Data Sources
  • 17. 17© Cloudera, Inc. All rights reserved. A certified product ensures it integrates securely • Authenticate via Kerberos or LDAP Authentication • Handle Apache Sentry with Hive, Impala, Search, HDFS Authorization • Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption Encryption
  • 18. 18© Cloudera, Inc. All rights reserved. Vulnerability Response and Process Vulnerability reports Upstream Internal External Fix Publish
  • 19. 19© Cloudera, Inc. All rights reserved. Cluster Security Levels
  • 20. 20© Cloudera, Inc. All rights reserved. Cloudera Enterprise 20 The modern platform for machine learning and analytics optimized for the cloud
  • 21. 21© Cloudera, Inc. All rights reserved. Enterprise Encryption Performance
  • 22. 23© Cloudera, Inc. All rights reserved. Disclaimer This talk serves as a general guideline for security implementation on Hadoop. The actual implementation procedures and scope of implementation vary on a case-by- case basis, and should be assessed by Cloudera’s Professional Services team or certified Cloudera SI Partners.
  • 23. 24© Cloudera, Inc. All rights reserved. Non-secure #0 Data Free for All
  • 24. 25© Cloudera, Inc. All rights reserved. Firewall ActiveDirectory/KDC Hadoop cluster Cloudera Manager Gateway node Cloudera Worker nodesDatacenter Applications
  • 25. 26© Cloudera, Inc. All rights reserved. 4 modes of Identity Management 1. Simple Authentication 2. Kerberos 3. LDAP 4. SAML File group ownership • AD integration • SSSD or Centrify Consideration in large enterprises. via SSSD via
  • 26. 27© Cloudera, Inc. All rights reserved. Simple Authentication detect the user Firewall ActiveDirectory Master Worker Worker Worker Cloudera Manager Master (SSSD/Centrify)
  • 27. 28© Cloudera, Inc. All rights reserved. Simple authentication = no authentication
  • 28. 29© Cloudera, Inc. All rights reserved. Minimal Security #1 Reduce Risk Exposure
  • 29. 30© Cloudera, Inc. All rights reserved. How it works: Authentication • LDAP and SAML authentication options Web UIs • LDAP/AD and Kerberos authentication options SQL Access •Kerberos authentication •Automation provided by Cloudera Manager to leverage Active Directory (AD) Command Lines User authenticates to AD or KDC Authenticated user gets Kerberos Ticket Ticket grants access to Services e.g. Impala User [ssmith] Password [***** ]
  • 30. 31© Cloudera, Inc. All rights reserved. Kerberos EXAMPLE.COM KDC user@EXAMPLE.COM Hadoop user@EXAMPLE.COM  user Strong Authentication KDC Key Distribution Center • MIT • ActiveDirectory (more common) realmprimary
  • 31. 32© Cloudera, Inc. All rights reserved. Kerberos Consideration in large corporates Time synchronization CM Kerberos Wizard • Configure AD to create a Kerberos principal for CM server, and to delegate CM the ability to create/manage Kerberos principals
  • 32. 33© Cloudera, Inc. All rights reserved. Kerberos Consideration in large corporates Time synchronization CM Kerberos Wizard • Configure AD to create a Kerberos principal for CM server, and to delegate CM the ability to create/manage Kerberos principals
  • 33. 34© Cloudera, Inc. All rights reserved. Kerberos Authentication * LDAP over SSL
  • 34. 35© Cloudera, Inc. All rights reserved. Authorization/Access Control HDFS File ACL YARN job submission Hbase ACLsOozie ACL Access Control List (ACLs) Hive Sentry Managed (RBAC) Impala
  • 35. 36© Cloudera, Inc. All rights reserved. Auditing
  • 36. 37© Cloudera, Inc. All rights reserved. Backup/Disaster Recovery Cloudera Backup/Disaster Recovery (BDR) • A high performance data replicator • Copies incremental data on the source cluster at specified schedules Supports  Kerberos  Data encryption  HDFS replication to cloud
  • 37. 38© Cloudera, Inc. All rights reserved. Kerberized BDR Best Practice Production DR Cloudera BDR PROD.EXAMPLE.COM Cross-realm trust KDC KDC DR.EXAMPLE.COM
  • 38. 39© Cloudera, Inc. All rights reserved. More Security #2 Managed, Secure, Protected
  • 39. 40© Cloudera, Inc. All rights reserved. Data In-Motion Encryption RPC encryption Data transport encryption • Supports AES CTR, up to 256-bit key length HTTP TLS/SSL encryption • No self-signed certificates in production Master Worker Worker Worker Master Application RPC encryption Transport encryption TLS/SSL
  • 40. 41© Cloudera, Inc. All rights reserved. Data At-Rest Encryption Transparent encryption Supports any Hadoop applications Encryption Zone $ hadoop key create mykey $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName mykey -path /zone / /tmp /zone foo bar Encryption zone
  • 41. 42© Cloudera, Inc. All rights reserved. Key Management Server Deployment (non-prod) HDFS NameNode Client Java Keystore KMS Keystore file Separation of duties • Encryption Zone Key (EZK) is stored in KMS server • HDFS super user can not decrypt files
  • 42. 43© Cloudera, Inc. All rights reserved. Key Management Server/Key Trustee Server Deployment HDFS NameNode Client Key Trustee KMS Key Trustee KMS Firewall Key Trustee Server (Active) Key Trustee Server (Passive) synchronization (or more)
  • 43. 44© Cloudera, Inc. All rights reserved. KMS+KTS+HSM Deployment HDFS NameNode Client HSM KMS HSM KMS Firewall Key Trustee Server (Active) Key Trustee Server (Passive) synchronization Key HSM (or more) Key HSM HSM HSM
  • 44. 45© Cloudera, Inc. All rights reserved. Troubleshooting: Encryption Performance Anomaly • Configuration • AES-NI Hardware acceleration • OpenSSL library • Entropy
  • 45. 46© Cloudera, Inc. All rights reserved. Fine Grained Access Control with Apache Sentry
  • 46. 47© Cloudera, Inc. All rights reserved. Most Security #3 Secure Data Vault
  • 47. 48© Cloudera, Inc. All rights reserved. Level 3 Secure Data Vault • All data, both data-at-rest and data-in-transit is encrypted • Key management system is fault-tolerant • Auditing mechanisms comply with industry, government, and regulatory standards (PCI, HIPAA, NIST, for example) • Auditing extends from EDH to the other systems that integrate with it. • Cluster administrators are well-trained • Security procedures have been certified by an expert • Cluster can pass technical review
  • 48. 49© Cloudera, Inc. All rights reserved. Data Redaction Personal Identifiable Information • PCI-DSS, HIPAA Best practices followed Password • stores in credential files, not in configuration Log, queries • Cloudera Manager
  • 49. 50© Cloudera, Inc. All rights reserved. Full Encryption Encrypt Data Spills • MapReduce • Impala • Hive • Flume OS-level encryption • Navigator Encrypt
  • 50. 51© Cloudera, Inc. All rights reserved. How to secure your Cloudera cluster
  • 51. 52© Cloudera, Inc. All rights reserved. Cloudera Documentation
  • 52. 53© Cloudera, Inc. All rights reserved. Cloudera Professional Services security engagement • Review security requirements and provide an overview of data security policies • Audit architecture and current systems for security policies and best practices • Custom tailor a security reference architecture • Optimize OS and Java to take advantage of hardware-based crypto-acceleration • Install and configure Kerberos with MIT Kerberos KDC or Active Directory • Install and configure Sentry and Cloudera Navigator (license required) • Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust • Review fine-grain permissions on sample data using Sentry • Review audit and lineage on sample data using Navigator • Use Cloudera Manager and Hue to review security integration for users • Enable and configure HDFS encryption https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html
  • 53. 54© Cloudera, Inc. All rights reserved. Cloudera online ondemand security course • Online self paced training course https://ondemand.cloudera.com • Launch planned for mid Feb 2018 • 3 days estimate worth of content at Cloudera level 1 and 2 security level • Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations : 1. Security overview 2. Security Architecture 3. Host Security 4. Encrypting Data in motion 5. Authentication 6. Authorization 7. Encrypting Data at Rest 8. Auditing 9. Additional Considerations: Data Governance
  • 54. 55© Cloudera, Inc. All rights reserved. Ondemand security course instructor guided demos 1. Potential Attack vectors 2. Securing the cluster hosts 3. Generating and managing keys for TLS 4. Configuring Cloudera Manager for TLS 5. Encrypting Data in Motion 6. Hadoop default authentication 7. Kerberizing Cluster with MIT Kerberos 8. Kerberizing Cluster with Active Directory 9. Configuring Authorising with Cloudera Manager 10. Controlling access to Yarn 11. Controlling access to HDFS 12. Controlling access to Tables 13. Enabling HDFS Encryption 14. Protecting local data with NavEncrypt 15. Using Navigator for auditing 16. Reassessing cluster security
  • 55. 56© Cloudera, Inc. All rights reserved. Ondemand security course disclaimer THIS IS REALLY IMPORTANT: The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a cluster using the CentOS 7.2 operating system. Given the almost limitless permutations of possible configurations, including different versions of CDH, Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other tools, as well as variations in policies, laws, and practices that affect each organization differently, it's impossible for a training course to cover all aspects of security. This course is meant to provide a background that will help you to understand many important concepts and techniques, but is not intended as a replacement for the relevant documentation or a consulting engagement with an expert who can provide advice based on your specific requirements. • Disclaimers ~ due to security variety and permutations • Versions used: CDH 5.12 and Centos 7.2
  • 56. 57© Cloudera, Inc. All rights reserved. Ondemand security course scenario • Many of our demonstrations are based on a hypothetical scenario • However, the concepts should apply to nearly any organization • Loudacre Mobile is a fast-growing wireless carrier • Employees serving in a variety of roles • Data ingested from many sources, in many formats • Data processed by many tools
  • 57. 58© Cloudera, Inc. All rights reserved. Ondemand security course environment
  • 58. 59© Cloudera, Inc. All rights reserved. Comprehensive demonstration cluster
  • 59. 60© Cloudera, Inc. All rights reserved. Sample chapter structure: Encrypting Data in Motion • Encryption Fundamentals • Certificates • Key Management  Instructor-Led Demonstration: Generating and Managing Keys for TLS • Configuring Cloudera Manager for TLS  Instructor-Led Demonstration: Configuring Cloudera Manager for TLS • Encrypting Hadoop’s Data in Motion  Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion • Essential Points
  • 60. 61© Cloudera, Inc. All rights reserved. Register your interest for OnDemand security course: peter.rizvi@cloudera.com
  • 61. © Cloudera, Inc. All rights reserved. Thank you

Notes de l'éditeur

  1. Markets, and customers, can only expand as quickly as the human element is able to support it. Right now we are in a time where the demand is very much outpacing the supply of qualified big data professionals. Maintaining a training function is critical for cloudera because we need to maintain a capable delivery ecosystem that allow our customers to thrive within the hadoop environment. Recruitment is one option for organizations to overcome this barrier, but that path comes with an additional challenge: finding the right candidates. When it comes to emerging technology skills, it’s a seller’s market. There is significant competition for a finite pool of skilled technologists; and this competition will only increase as the use of this technology increases. Faced with an ever-tightening supply of qualified job applicants, organizations are finding that the costs to recruit new employees far exceeds the cost to train existing ones, and also that current employees are more than willing to be trained. The need for IT talent is only going to increase in an ever-expanding range of industries. Consider that by 2020, GE – known primarily as a manufacturer, expects to generate $15 billion from software, which would make it one of the top 10 software companies in the world. Or consider that 70 percent of Monsanto’s total jobs are already in science, technology, engineering, or math. Certainly many of those are in chemical and crop engineering, but increasingly, many are in IT, analytics, the Internet of Things, and digital operations. Monsanto is competing for skills not just with other agribusinesses but with companies in all industries. Organizations need to consider the cost of recruitment, and attrition. A majority of analysis around the topic of training confirm that employees that receive training are more likely to remain at their current employers. It allows them to learn new skills, and illustrates their employers are investing in them. For technologists, hadoop… spark… and the other projects that compose our platform open up a world of possibilities and curiosity. It is challenging and rewarding. We have several customers that build out robust hadoop training plans as a benefit to their employees, and the returns they see in the innovation on the platform and employee retention makes the cost of training a major value when viewed the spectrum of both short and long term returns. The evolution of the data center in the past few decades has mandated that IT decisions are now critical not just for back office operations, but more so critical in nearly every aspect of a business. With regard to “big data”, the technologies leveraged are very linked to an organizations customers and markets. As such, Business leaders are tasked with transforming their business to accommodate the realities of the “data-driven” market. This mean in some cases updating of hardware, and implementation of new software, but also the upgrade of the skills of their internal staff. If the talent of your staff is a concern, you are not alone. Cloudera, and analyst firms such as IDC, have polled organizations about enterprise software deployments… not surprisingly one of the primary areas of concern for Cloudera prospects and customers are the skills of their staff. This is a new way of computing, and harnessing the benefits of a Cloudera subscription requires employees familiar with the tools included in the platform, and an understanding of how to best leverage them for their use case. IDC looked at projects more generally, but solicited input from over 500 managers implementing IT projects on what were the critical factors in the success of a project. Since we are discussing training, and building out a team of experts on this call, I’m guessing you are assuming it was not the software, not clearly defined business objectives, or a solid project plan which predicated success. Overwhelmingly managers ranked the skill and dedication of the project team as the factor which played the largest part in the success of their project. We want to make sure that customers include the human element needed to role out a successful project as they consider a Cloudera subscription.
  2. I’ve alluded to some of these options early in the presentation; but to ensure there is clarity on our delivery options… we offer both public and private training. Public training courses are scheduled around the globe by Cloudera and by our Authorized Training Partners. Authorized training partner instructors go through the same procedures as Cloudera instructors, regularly also provide field services in their regions, and allow for local language delivery in areas where we do not have direct coverage. Public training schedules can be found on Cloudera’s website where you can search by course title and/or location of interest. Public training is a nice option if you have just a few team members that need training, or you need to get someone ramped up in a short timeframe. Students are able to interact with their peers from other organizations implementing Cloudera solutions, and a live instructor. Private Training is reserved for a customer who wants their entire team to be trained. Normally we say if you have seven or more students who need the same training class, its worth your while to explore our private training option. We’ll send an instructor to a location of your choice to deliver training specific to your needs. Regularly the training is one of the courses that I’ve described earlier in this presentation, but if needed, we can also customize the content to align it with your business objectives. To be clear, “customization” is not new content creation, it is creating an agenda from our portfolio of content that makes sense for the customer. Some examples would be adding Spark ML or JEP to Spark and Hadoop training to make it a five day course, or cutting Pig from Data Analyst training to make it a three day course. We generally recommend not trying to customize a course by looking at disparate topics across many classes – it usually ends up having no flow or connection, and the students leave with more questions than answers. Our courses build on concepts throughout the duration of the class. Customization is encouraged, but shouldn’t be abused. Private Training courses are available for “up to 10” or “up to 16” students. Virtual training is live training that is delivered over the internet. Both public and private classes can be delivered in this manner. From a public perspective, it’s a popular option for individuals who are not local to one of our training locations. Private customers with geographically dispersed team also find this means to save on the travel costs it would take to bring the team to a central location. OnDemand training is a library of pre-recorded training classes, which allows for 24x7, self-paced training in a searchable environment. Our entire portfolio of content is available in this format, and students leverage a cloud-based lab environment to complete the same hands-on exercises we deliver in the live classrooms. Courses can be bought as a library, or by individual title. Certification, I’ve touched on earlier. Certifications may be bought in bulk via PO, or purchased directly via our website. Certification candidates are remotely monitored, and are not required to go into a testing center to compete the exam. All you need is an internet connection. Prices range from $295 for CCA level exams to $400 for CCP: Data Engineer, or $600 per CCP: Data Science exam.
  3. … and here is what I talked about in the past three slides, in summary. Over time, we will be adding courses to the Administrator training path focused on Security, Cloud, and Architecture – look for those in the next calendar year. We also have plans to iterate and/or augment our Developer, Data Analyst, and Data Scientist content to reflect the evolution of the technology.
  4. This talk is mainly about security implementation from both an engineering and a support perspective.
  5. Data breach incidents are increasing year by year. This year alone there have been a number of high profile breaches. Security is built deep in Hadoop, but it does not work out of box. Rome is not bulit in a day. As you will learn during your security implementation process, it takes a lot of configurations and best practices to make a secure Hadoop cluster. Good news: Cloudera Manager and Navigator is there to the rescue! Cloudera’s platform is built on top of Apache Hadoop technology. It is the first Hadoop platform to achieve PCI-compliance.
  6. New York State Department of Financial Services “紐約州金融服務署” Breach Notification Right to Access Right to be Forgotten Data portability Privacy by Design Data Protection Officers
  7. But obviously it takes more than good people and processes. You need the right technology. Let’s get down to brass tacks on what the software is about We’re based on an open source core. A complete, integrated enterprise platform leveraging open source HOSS business model - core set of platform capabilities – we contribute actively into that community. and we layer value added software on top - that’s how we run our business. But what’s truly differentiating about our platform is the enterprise experience you get. It’s why we’re able to claim 7 of the top ten banks and 9 of the top ten telcos are our customers. For regulated industries, the enterprise experience is critical. Multi-cloud – No vendor lock in. Work in the environment of your choice. Better pricing leverage Managed TCO – Multiple pricing and deployment options Integrated – Integrated components with shared metadata, security and operations Secure - Protect sensitive data from unauthorized access – encryption, key management Compliance – Full auditing and visibility Governance – Ensure data veracity
  8. Apps share data, rather than data replicated for apps Lower costs because less data to replicate More secure because data is in one central location Easier to build apps because data is easily accessible Open architecture to share data with other teams and workloads, including data science
  9. Apps share data, rather than data replicated for apps Lower costs because less data to replicate More secure because data is in one central location Easier to build apps because data is easily accessible Open architecture to share data with other teams and workloads, including data science
  10. As a customer, you will most likely not interact with Cloudera’s platform directly. Typically customers access Cloudera’s platform indirectly through partner products. To ensure the same security protocol is not breached, we certify partner products with security in mind. For the purpose of this talk, I am going to briefly mention Cloudera’s certification process from a security perspective. Should also hire Cloudera certified administrators, or hire professional services from Cloudera SI partners
  11. A little bit on partner product certification https://docs.google.com/a/cloudera.com/document/d/1XwRV_bVZrM90JsPhHxLYAgd6vCdvT7qQ-k8eIQ2QYsk/edit?usp=sharing
  12. Upstream = reports coming from apache project. Each apache project has a private security@ mailing alias. Obey Apache’s security policy Internal = reports coming internally from Cloudera. Cloudera Engineering run several security weakness detection tools looking for security issues in the software. External = reports coming from third party or a customer.
  13. Cloudera works hard to provide security on top of the big data platform. In this talk, I will present the best practices and common pitfalls of security implementation on Hadoop, based on my experience working with customers. Source: https://www.cloudera.com/documentation/enterprise/latest/topics/sg_edh_overview.html#topic_ads_t2q_1r Achieving data security is costly. Depending on use cases and sensitivity of data, enterprise may decide which level of security is desired. Typically, enterprises choose to implement security on Hadoop step by step. Or hire Cloudera PS to make a custom security implementation plan and complete these steps in one shot.
  14. https://cloudera.app.box.com/files/0/f/6321638305/1/f_56252438130 TPC-DS Impact is very little This is tested with Key Trustee. HSM is currently very slow AES-NI As the result shows below the percentage overhead of using encryption on system was 2% in terms of query execution time and 3.1% in CPU time.
  15. A secure system takes more than just a good product. It also requires experienced people to integrate it and operate it. These people must receive the proper training. Technology: Cloudera’s platform and certified partners’ products, post-sell support People: Cloudera PS team or SI partners, consulting firms, customer’s admin, users Process: SOP, documentation, regular audits, compliance plan, not covered in this talk
  16. Depend on existing firewalls.
  17. Leverage existing firewall mechanisms in the enterprise to set up perimeter. First line of defense Firewall exposes only: gateway nodes for submitting jobs, and CM and CN interface. System chart: CM, master node (HA), worker nodes, firewalls,
  18. The Cloudera’s platform does not manage user authentication. Instead, it relies on external authentication mechanism for that purpose, such as Kerberos, LDAPs or AD. For simple authentication it gets user name from local operating system user name. But it is too much effort trying to ensure accounts are consistent. So use AD + SSSD/Centrify CDH is composed of many open source projects, and as a result, not all of them support the same set of authentication mechanisms. There are (simple, kerberos, ldap, saml) supported. AD integration – it is likely your enterprise is already using ActiveDirectory for user identity control. --- use SSSD instead of LdapGroupsMapping. --- Create dedicated OU for cluster --- use LDAP over SSL Need to select a good base, so that AD returns quickly. A slow lookup can stop all operations. LDAP authentication can be used for CM, Hue, Hive and Impala. The latency of LDAP request/response is critical for cluster performance.
  19. User identity can be forged easily. It is okay to have unsecured dev cluster, or PoC cluster.
  20. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Key Distribution Center KDC Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  21. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  22. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  23. Authentication is a prerequisite of authorization Access control lists (ACLs) restrict who can submit work to dynamic resource pools and administer them.
  24. Cloudera Navigator  Enable Audit Collection Audit log retention Provenance use case A number of business decisions and transactions rely on the verifiability of the data used in those decisions and transactions. Data-verification questions might include:How was this mortgage credit score computed? How can I prove that this number on a sales report is correct? What data sources were used in this calculation? Auditing use case What was a specific user doing on a specific day? Who deleted a particular directory? What happened to data in a production database, and why is it no longer available?
  25. A backup/DR cluster that is purely for DR purpose (replicates between multiple untrusted Kerberos realms) https://blog.cloudera.com/blog/2016/08/considerations-for-production-environments-running-cloudera-backup-and-disaster-recovery-for-apache-hive-and-hdfs/
  26. One Kerberos realm per cluster BDR runs from destination. Must configure the destination realm to trust source realm The DR cluster should not be used for any purposes other than DR.
  27. AES/CTR/NoPadding is an encryption algorithm.
  28. At-rest encryption is required by PCI-DSS, FISMA, HIPAA Separation of duties -- NameNode vs KMS Hdfs superuser cannot decrypt keys. At rest encryption is more complex than in-transit, because the key is typically not updated for a long time, so need a more complex mechanism to protect keys An encryption zone can only be created for an empty directory. There’s a workaround to run hdfs distcp to copy files into the EZ. Supports at most 256 bit encryption. ”Always-on encryption zone”/”nested encryption zone” support in CDH5.7 but no CM support i.e. doesn’t work end-to-end
  29. https://www.cloudera.com/documentation/enterprise/latest/topics/encryption_ref_arch.html Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  30. Resource planning & requirement: Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. (total of 4 hosts) KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  31. Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  32. https://cloudera.app.box.com/files/0/f/6321638305/1/f_56252438130 TPC-DS Misconfiguration Use aes/ctr/nopadding, (Data Transfer Encryption Algorithm) default is 128-bits/ 256-bits (managed by CM) Low entropy : /proc/sys/kernel/random/entropy_avail Hardware acceleration Openssl library Entropy configuration
  33. One of the characteristics of Hadoop platform, is there are a variety of tools capable of accessing the same set of data. For example, …MapReduce, Hive, Impala, Pig and 3rd party software can all access HDFS. A unified access control is crucial. Pig, Sqoop and Kafka are also supported by Sentry. If Impala is used, Sentry is a must. By default, Impala can be accessed by user impala 3rd party BI tools may not support Sentry, which must enforce access through HiveServer2. Migrating from no Sentry to Sentry is a tremendous work, and hard to rollback
  34. In regulated industry, the regulation such as PCI or HIPAA requires redaction of PIIs. (such as SSNs) https://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html https://blog.cloudera.com/blog/2015/06/new-in-cdh-5-4-sensitive-data-redaction/
  35. In regulated industry, the regulation such as PCI or HIPAA requires redaction of PIIs. (such as SSNs) https://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html https://blog.cloudera.com/blog/2015/06/new-in-cdh-5-4-sensitive-data-redaction/
  36. Intermediate files. Certain services may write spilled data outside HDFS, on local disk. So additional configuration is required to ensure they are encrypted as well. Navigator Encrypt is a kernel model that intercepts I/O requests to encrypted datastores, including log files, config file, temp file, databases
  37. Other references: https://cloudera.app.box.com/files/0/s/firewall/1/f_202846938208 Ben and Joey were both long time Cloudera Solution Architects