As companies seek to achieve business value from big data via exploratory analytics, they need agility to access and explore the data. But agility can turn into chaos if the need for self-service data exploration isn't balanced with data security and data governance. It’s not enough to simply load data in Hadoop and put self-service tools in the hands of users – you must consider your data governance and security policies as well. Data access permissions and regulatory compliance demand confidentiality safeguards on sensitive data such as personally identifiable information. Sensitive data going into Hadoop must be identified and protected before users can access it.
In this webinar, Dale Kim from MapR and Oliver Claude from Waterline Data offer a best-practices approach to deploy a governed data lake. They discuss how to:
- Automatically discover business and compliance metadata, audit history and data lineage to achieve compliance.
- Build an automated field-level metadata discovery and business glossary.
- Ensure field-level data quality.
- Protect data with authentication, authorization and encryption security features.
- Prepare for auditing by logging all actions, including data access, in your system.