SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
RUNNING AN APACHE PROJECT:
10 TRAPS AND HOW TO AVOID
THEM
Owen O’Malley
omalley@apache.org
September 2019
@owen_omalley
© 2019 Cloudera, Inc. All rights reserved. 2
WHO AM I?
• First committer added to Hadoop
 Working at Yahoo in 2006
 Original VP when Hadoop became a TLP
• Committer & PMC member on
 Ambari
 Hadoop
 Hive
 ORC
• Mentor for: Giraph, Kafka, Knox, Kylin, Iceberg, Metron, Ranger, Reef, & Tez
GETTING STARTED
© 2019 Cloudera, Inc. All rights reserved. 4
MISTAKE 1. STARTING ON GITHUB WITHOUT AN IP AGREEMENT
• Starting a project as GitHub repositories is very easy
Start developing immediately!
Allows building community before entering Apache
• But… getting lawyers to sign a code grant takes time
Need legal sign offs from each contributors company
Apache requires code grants for large chunks of code
© 2019 Cloudera, Inc. All rights reserved. 5
MISTAKE 1. STARTING ON GITHUB WITHOUT AN IP AGREEMENT (CONT.)
• When Hortonworks & Vertica developed a C++ ORC reader
Always intended to move to Apache
Only two companies
Still took a couple months to get the code grant signed
• Get IP agreements before committing code!
© 2019 Cloudera, Inc. All rights reserved. 6
MISTAKE 2. KEEPING YOUR PROJECT SECRET
• Open source is a vibrant ecosystem
Projects fill niches in that ecosystem
Creates choices for users and developers
• Your project is competing with many others
Apache doesn’t pick winners and losers
Fighting for attention
© 2019 Cloudera, Inc. All rights reserved. 7
MISTAKE 2. KEEPING YOUR PROJECT SECRET (CONT.)
• Advertise your project!
• Make building a good project website a priority!
Take down the old site
• Give conference talks
Tell users about new features
• Write blogs
Use cases
Experience in production
© 2019 Cloudera, Inc. All rights reserved. 8
MISTAKE 3. NOT FOSTERING DIVERSITY
• Apache is of two minds with respect to employers
Apache doesn’t care who your employer is.
Projects should encourage a diverse set of employers
• Your karma is yours, not your employer’s
Expected to keep your “hats” separate
• Avoid group-think
More voices and viewpoints are very very good
Happy users make your project grow
© 2019 Cloudera, Inc. All rights reserved. 9
MISTAKE 3. NOT FOSTERING DIVERSITY (CONT.)
• Don’t assume all smart people work at your company
Innovation happens everywhere
• Separate the company’s goals from the project’s
Don’t shoot down proposals because
• They compete with your proprietary products
• Would create work for your proprietary products
• Don’t promise features in upcoming versions
© 2019 Cloudera, Inc. All rights reserved. 10
MISTAKE 4. HOLDING FACE TO FACE DEVELOPER MEETINGS
• Excludes remote people
• Even video meetings are hard for different time zones
• Holding roadmap meetings are particularly problematic
Need to ensure full access to the community
Bring discussion back to the email list before finalizing
• When writing to the lists, use “I” instead of “we”
You are presenting your opinion, not a group’s
• Make your project website welcoming & helpful
© 2019 Cloudera, Inc. All rights reserved. 11
MISTAKE 5. INCLUDING BINARY RELEASE ARTIFACTS
• Many projects make source and binary release artifacts
• Binary artifacts are hard to review and get right
Make reproducible builds
Licensing for binary artifacts is the transitive closure
Watch Docker file artifact versions
• Far better to make only source release artifacts
Can make convivence binary artifacts after release vote
Even better is to make downstream binary artifacts
KEEP GOING
© 2019 Cloudera, Inc. All rights reserved. 13
MISTAKE 6. HOLDING A HIGH BAR FOR COMMITTER AND PMC MEMBERS
• Some projects require a lot of patches to make committer
• Much worse if project has a large patch queue
Really hard if you don’t know or work with a committer
3.6k uncommitted patches on Hadoop’s Jiras since 2006
• A committer shortage makes the patch queue worse
• Most important for becoming a committer should be:
Good technical taste
Knowing their own limits
© 2019 Cloudera, Inc. All rights reserved. 14
MISTAKE 7. IGNORING TRADEMARKS
• Make sure that your project doesn’t use another trademark
Often comes down to a judgement call about risks
Changing early is much better than later
• Ensure people don’t abuse your trademark
Very hard if it is a user/non-project member
Project members need to fix their company’s behavior
• Board has removed PMC members
• Hold training classes for engineers and marketing
© 2019 Cloudera, Inc. All rights reserved. 15
MISTAKE 8. NOT REWARDING OPEN SOURCE WORK
• If employees work on open source projects:
• Make and measure the engineer’s objectives reflect this
Code contributions
Documentation contributions
Code reviews – include other companies’ patches
Conference presentations
• Make the managers’ goals also reflect the community time
© 2019 Cloudera, Inc. All rights reserved. 16
MISTAKE 9. STEALTH DEVELOPMENT
• Developing off-line breaks the community
Cuts community off from participation
Forward motion stops & release train stalls
• Yahoo developed Hadoop Security privately
0.18, 0.19, 0.20 were ~3-4 months
0.21 was 12 months, Facebook & LinkedIn forked
0.20.203 was 24 months from 0.20
1.0 was 8 more months
© 2019 Cloudera, Inc. All rights reserved. 17
MISTAKE 10. LICENSING PROBLEMS
• Your project should have source with permissive licenses
Eg. Apache, BSD, MIT
• Can have binary dependency on weak-copyleft licenses
Eg. Eclipse, Mozilla, Creative Commons Attribution
• Can only use Category X in very specific cases
Eg. GPL, LGPL, JSON, CC-BY-A
Must be an optional, build tool, or system provided
• Includes recursive dependencies!
© 2019 Cloudera, Inc. All rights reserved. 18
MISTAKE 10. LICENSING PROBLEMS (CONT.)
• Can sneak up on you
Updated aircompressor dependency from 0.10 to 0.15
Started using the new zstd codec
Previously excluded slice dependency was now required
Slice depends on jol-core, which is GPL.
• Fortunately, the use of slice was for one method
Worked with aircompressor to get a fix
• Copying code from stack overflow has the same problem!
© 2019 Cloudera, Inc. All rights reserved. 19
SUMMARY
• Build your community
Give talks
Make the website
Make the project easy to build
Make the community friendly – look at Beam talk
• Do your work in the open
• Train your employees in Apache & open source
Hortonworks training - https://s.apache.org/apache-training
THANK YOU
Owen O’Malley
omalley@apache.org
@owen_omalley

Contenu connexe

Tendances

A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerDataWorks Summit
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 

Tendances (20)

A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 

Similaire à Running An Apache Project: 10 Traps and How to Avoid Them

Mainframe DevOps: A Zowe CLI-enabled Roadmap
Mainframe DevOps: A Zowe CLI-enabled RoadmapMainframe DevOps: A Zowe CLI-enabled Roadmap
Mainframe DevOps: A Zowe CLI-enabled RoadmapDevOps.com
 
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise how not to ...
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise  how not to ...OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise  how not to ...
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise how not to ...FINOS
 
Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...All Things Open
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analyticsWei-Chiu Chuang
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersCisco DevNet
 
DevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesDevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesRich Mills
 
GitOps, Jenkins X &Future of CI/CD
GitOps, Jenkins X &Future of CI/CDGitOps, Jenkins X &Future of CI/CD
GitOps, Jenkins X &Future of CI/CDRakuten Group, Inc.
 
Hack for Good and Profit (Cloud Foundry Summit 2014)
Hack for Good and Profit (Cloud Foundry Summit 2014)Hack for Good and Profit (Cloud Foundry Summit 2014)
Hack for Good and Profit (Cloud Foundry Summit 2014)VMware Tanzu
 
New in the Visual Studio 2012 IDE
New in the Visual Studio 2012 IDENew in the Visual Studio 2012 IDE
New in the Visual Studio 2012 IDELearnNowOnline
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps JourneyDevOps.com
 
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)VMware Tanzu
 
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipse
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for EclipseCloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipse
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipsedmbtr3
 
DevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesDevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesRich Mills
 
Oracle: Building Cloud Native Applications
Oracle: Building Cloud Native ApplicationsOracle: Building Cloud Native Applications
Oracle: Building Cloud Native ApplicationsKelly Goetsch
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewForgeRock
 
Jenkins World 2019 - Integrating jenkins x with your business
Jenkins World 2019 - Integrating jenkins x with your businessJenkins World 2019 - Integrating jenkins x with your business
Jenkins World 2019 - Integrating jenkins x with your businessMauricio (Salaboy) Salatino
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect WorkshopPatrick Chanezon
 
Improving Your Apache Project's Image And Brand
 Improving Your Apache Project's Image And Brand Improving Your Apache Project's Image And Brand
Improving Your Apache Project's Image And BrandShane Curcuru
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 

Similaire à Running An Apache Project: 10 Traps and How to Avoid Them (20)

Mainframe DevOps: A Zowe CLI-enabled Roadmap
Mainframe DevOps: A Zowe CLI-enabled RoadmapMainframe DevOps: A Zowe CLI-enabled Roadmap
Mainframe DevOps: A Zowe CLI-enabled Roadmap
 
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise how not to ...
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise  how not to ...OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise  how not to ...
OSSF 2018 - Colin Charles of GrokOpen - Community vs. enterprise how not to ...
 
Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analytics
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API Providers
 
DevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesDevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in Microservices
 
GitOps, Jenkins X &Future of CI/CD
GitOps, Jenkins X &Future of CI/CDGitOps, Jenkins X &Future of CI/CD
GitOps, Jenkins X &Future of CI/CD
 
Hack for Good and Profit (Cloud Foundry Summit 2014)
Hack for Good and Profit (Cloud Foundry Summit 2014)Hack for Good and Profit (Cloud Foundry Summit 2014)
Hack for Good and Profit (Cloud Foundry Summit 2014)
 
New in the Visual Studio 2012 IDE
New in the Visual Studio 2012 IDENew in the Visual Studio 2012 IDE
New in the Visual Studio 2012 IDE
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps Journey
 
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)
Introducing Cloud Foundry Integration for Eclipse (Cloud Foundry Summit 2014)
 
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipse
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for EclipseCloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipse
Cloud Foundry Summit 2014: Introducing Cloud Foundry Integration for Eclipse
 
DevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in MicroservicesDevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in Microservices
 
Oracle: Building Cloud Native Applications
Oracle: Building Cloud Native ApplicationsOracle: Building Cloud Native Applications
Oracle: Building Cloud Native Applications
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
 
Jenkins World 2019 - Integrating jenkins x with your business
Jenkins World 2019 - Integrating jenkins x with your businessJenkins World 2019 - Integrating jenkins x with your business
Jenkins World 2019 - Integrating jenkins x with your business
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop
 
Improving Your Apache Project's Image And Brand
 Improving Your Apache Project's Image And Brand Improving Your Apache Project's Image And Brand
Improving Your Apache Project's Image And Brand
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 

Plus de Owen O'Malley

Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACIDOwen O'Malley
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryptionOwen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 IcebergOwen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopOwen O'Malley
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to HiveOwen O'Malley
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduceOwen O'Malley
 

Plus de Owen O'Malley (20)

Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
ORC Files
ORC FilesORC Files
ORC Files
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 

Dernier

Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotEdgard Alejos
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 

Dernier (20)

Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform Copilot
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 

Running An Apache Project: 10 Traps and How to Avoid Them

  • 1. RUNNING AN APACHE PROJECT: 10 TRAPS AND HOW TO AVOID THEM Owen O’Malley omalley@apache.org September 2019 @owen_omalley
  • 2. © 2019 Cloudera, Inc. All rights reserved. 2 WHO AM I? • First committer added to Hadoop  Working at Yahoo in 2006  Original VP when Hadoop became a TLP • Committer & PMC member on  Ambari  Hadoop  Hive  ORC • Mentor for: Giraph, Kafka, Knox, Kylin, Iceberg, Metron, Ranger, Reef, & Tez
  • 4. © 2019 Cloudera, Inc. All rights reserved. 4 MISTAKE 1. STARTING ON GITHUB WITHOUT AN IP AGREEMENT • Starting a project as GitHub repositories is very easy Start developing immediately! Allows building community before entering Apache • But… getting lawyers to sign a code grant takes time Need legal sign offs from each contributors company Apache requires code grants for large chunks of code
  • 5. © 2019 Cloudera, Inc. All rights reserved. 5 MISTAKE 1. STARTING ON GITHUB WITHOUT AN IP AGREEMENT (CONT.) • When Hortonworks & Vertica developed a C++ ORC reader Always intended to move to Apache Only two companies Still took a couple months to get the code grant signed • Get IP agreements before committing code!
  • 6. © 2019 Cloudera, Inc. All rights reserved. 6 MISTAKE 2. KEEPING YOUR PROJECT SECRET • Open source is a vibrant ecosystem Projects fill niches in that ecosystem Creates choices for users and developers • Your project is competing with many others Apache doesn’t pick winners and losers Fighting for attention
  • 7. © 2019 Cloudera, Inc. All rights reserved. 7 MISTAKE 2. KEEPING YOUR PROJECT SECRET (CONT.) • Advertise your project! • Make building a good project website a priority! Take down the old site • Give conference talks Tell users about new features • Write blogs Use cases Experience in production
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 MISTAKE 3. NOT FOSTERING DIVERSITY • Apache is of two minds with respect to employers Apache doesn’t care who your employer is. Projects should encourage a diverse set of employers • Your karma is yours, not your employer’s Expected to keep your “hats” separate • Avoid group-think More voices and viewpoints are very very good Happy users make your project grow
  • 9. © 2019 Cloudera, Inc. All rights reserved. 9 MISTAKE 3. NOT FOSTERING DIVERSITY (CONT.) • Don’t assume all smart people work at your company Innovation happens everywhere • Separate the company’s goals from the project’s Don’t shoot down proposals because • They compete with your proprietary products • Would create work for your proprietary products • Don’t promise features in upcoming versions
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 MISTAKE 4. HOLDING FACE TO FACE DEVELOPER MEETINGS • Excludes remote people • Even video meetings are hard for different time zones • Holding roadmap meetings are particularly problematic Need to ensure full access to the community Bring discussion back to the email list before finalizing • When writing to the lists, use “I” instead of “we” You are presenting your opinion, not a group’s • Make your project website welcoming & helpful
  • 11. © 2019 Cloudera, Inc. All rights reserved. 11 MISTAKE 5. INCLUDING BINARY RELEASE ARTIFACTS • Many projects make source and binary release artifacts • Binary artifacts are hard to review and get right Make reproducible builds Licensing for binary artifacts is the transitive closure Watch Docker file artifact versions • Far better to make only source release artifacts Can make convivence binary artifacts after release vote Even better is to make downstream binary artifacts
  • 13. © 2019 Cloudera, Inc. All rights reserved. 13 MISTAKE 6. HOLDING A HIGH BAR FOR COMMITTER AND PMC MEMBERS • Some projects require a lot of patches to make committer • Much worse if project has a large patch queue Really hard if you don’t know or work with a committer 3.6k uncommitted patches on Hadoop’s Jiras since 2006 • A committer shortage makes the patch queue worse • Most important for becoming a committer should be: Good technical taste Knowing their own limits
  • 14. © 2019 Cloudera, Inc. All rights reserved. 14 MISTAKE 7. IGNORING TRADEMARKS • Make sure that your project doesn’t use another trademark Often comes down to a judgement call about risks Changing early is much better than later • Ensure people don’t abuse your trademark Very hard if it is a user/non-project member Project members need to fix their company’s behavior • Board has removed PMC members • Hold training classes for engineers and marketing
  • 15. © 2019 Cloudera, Inc. All rights reserved. 15 MISTAKE 8. NOT REWARDING OPEN SOURCE WORK • If employees work on open source projects: • Make and measure the engineer’s objectives reflect this Code contributions Documentation contributions Code reviews – include other companies’ patches Conference presentations • Make the managers’ goals also reflect the community time
  • 16. © 2019 Cloudera, Inc. All rights reserved. 16 MISTAKE 9. STEALTH DEVELOPMENT • Developing off-line breaks the community Cuts community off from participation Forward motion stops & release train stalls • Yahoo developed Hadoop Security privately 0.18, 0.19, 0.20 were ~3-4 months 0.21 was 12 months, Facebook & LinkedIn forked 0.20.203 was 24 months from 0.20 1.0 was 8 more months
  • 17. © 2019 Cloudera, Inc. All rights reserved. 17 MISTAKE 10. LICENSING PROBLEMS • Your project should have source with permissive licenses Eg. Apache, BSD, MIT • Can have binary dependency on weak-copyleft licenses Eg. Eclipse, Mozilla, Creative Commons Attribution • Can only use Category X in very specific cases Eg. GPL, LGPL, JSON, CC-BY-A Must be an optional, build tool, or system provided • Includes recursive dependencies!
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 MISTAKE 10. LICENSING PROBLEMS (CONT.) • Can sneak up on you Updated aircompressor dependency from 0.10 to 0.15 Started using the new zstd codec Previously excluded slice dependency was now required Slice depends on jol-core, which is GPL. • Fortunately, the use of slice was for one method Worked with aircompressor to get a fix • Copying code from stack overflow has the same problem!
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19 SUMMARY • Build your community Give talks Make the website Make the project easy to build Make the community friendly – look at Beam talk • Do your work in the open • Train your employees in Apache & open source Hortonworks training - https://s.apache.org/apache-training