Speaker: Eddie Garcia
Open data is quickly gaining momentum and when applied as data for good, it becomes a much more powerful concept that we should all consider as good data stewards. Organizations to cities are starting to share data like traffic conditions or climate sensors and allowing others to use this open data to improve quality of life. But could this same open data be used for more nefarious reasons? Very likely so in this session we will bridge the balance between sharing data and locking it down, and how security should be on by default to protect all data including open data.
Good morning everyone, I am very excited to be here presenting for you today and I hope you have enjoyed Strata so far.
Everyone in this room is generating data - even if you are not moving at all, you are still generating data
Data is what we use every day to decide what time to leave for work
Organizations use it to decide how to improve their revenues
Governments use it to provide better services to their citizens
It’s no wonder it’s growing at exponential rates
All of this data opens new possibilities to everyone, everywhere, at any time
Today, too much data is not a problem. We all know that you can never have too much of it.
More data can also create more risk. With more types of data coming in from more sources, and being accessed by more users, at faster rates than ever before – the opportunities for risk are also growing.
All of this data opens new possibilities to everyone, everywhere, at any time
Today, too much data is not a problem. We all know that you can never have too much of it.
Security must be on everyones’ minds as we propel forward into our data driven world. The opportunities that data can drive are endless, has too much potential to have it be tainted by attacks or slowed down by doubts or fears.
Especially as more of this data is opened and shared for the greater good. Open data is already a trend we have seen, with organizations like the Open Data Network and cities like Kansas City using open data to provide information on housing, public spaces, and other government initiatives. These open data programs are expanding the possibilities of data, but they’re also compounding the risks of privacy breaches and public attacks. Data security needs to be addressed not only for corporations, but for the exposed open data.
This is possible.
The technology and algorithms already exist today that can provide us means to verify data integrity and there has been a lot of progress made in protecting big data
To drive this security, we need to join together to create the new standard. Those of you in the security space may be familiar with the concept of “secure-by-default”. Today, I want to introduce the idea of secure-by-default data. Together, right here and right now, let’s make this the new standard. This is important.
Let’s start with the definition. I define secure-by-default data as data that self-contains properties that allows systems to secure it’s confidentiality and integrity
Data that comes with a built in security contract between its producers and consumers
The time to establish these standards for secure-by-default data formats is now
We have already seen standards being developed within all aspects of Big Data, from data formats like Apache Avro and Apache Parquet to unified access controls with Apache Sentry. We need to establish the next standards, the new formats that will enable secure-by-default data. The data formats that enable data governance inside and outside of Hadoop.
Together, we can leverage this same Open Data to develop and test secure-by-default data formats. Not only will that help to accelerate development but it will also drive adoption of these standards across the enterprise, commercial and public sectors and ensure any and all data is secure and protected, whether it exists within a private corporation or even being opened and shared for greater public use and benefit.
At Cloudera, we have seen the importance of securing and governing and are thrilled that much of the community, our partners, and our customers have helped us drive some critical improvements. Together with Intel, Hadoop now has encryption technology that allows us to encrypt all data, with an unnoticeable impact on performance. We have made Hadoop data governance a reality for our customers – giving them unprecedented visibility and control into how their data is being used and how it’s changing. This was also the year we saw Hadoop meet even the most stringent security regulations – with Cloudera’s platform becoming the first and only Hadoop platform to achieve PCI compliance (a regulatory requirement necessary for organizations handling credit card information – similar to regulations like HIPAA that protect patient data)
The possibilities for data are just beginning. Through open and secure data, we will see data truly at its best and being used for the greater good and protecting our environment
Big data is already playing a part in this. High fidelity information obtained by instrumenting the entire energy chain from creation to consumption will allow us to be much smarter about how we create and use energy. Data can help us get smarter, and change behavior, both institutionally and individually.
Data can not only help us optimize our use existing energy systems, but help us transition to new renewable technologies.
Open and secure data improves public health while protecting the most private of information.
Augmenting existing subjective measurements of patients with objective measurements from device data, and making such data available to the medical community has the potential to transform our understanding of disease. For example, the Michael J Fox foundation and Intel are using wearables to look at the measurable features of Parkinsons — like slowness of movement, tremor and sleep quality — which may enable researchers to assemble a better picture of the clinical progression of Parkinson’s. In the spirit of using the combined power of humans and computers, they are correlating data collected in clinical observations to patient diaries.
Open and secure data improves education of future generations – giving everyone access to the tools and knowledge they need, no matter where they are located – while also improving education as a whole. With personalized lesson plans that adjust based on a student’s ability to recommendations of the right type of content to ensure success. By educating our future, we can ensure continued innovation around open data, for purposes yet to be explored.
All of us here have a responsibility. We all must protect this data. We are all responsible for being good data stewards. Secure-by-default data needs to you.
CTA for talking with organizations/driving open development/making this the standard
[Should we say something about using the rest of this show to meet with others to start collaborating?]