It isn't easy to drink from the technology firehose of today's Internet economy. At Connexity, we have gone from home-grown MapReduce frameworks and custom in-house search-engines to extensive use of Apache Hadoop, Hive, Pig, Cassandra, Solr and other technologies to power our business. This talk will explore some of the evolutionary steps that we've made and what lessons you might draw from our 15+ years of experience of swimming with the Internet sharks.
2. Connexity
Shopping powers our marketing platforms
2
• Paid Search & Marketplace
Performance-based marketing that finds in-
market shoppers and delivers conversions at
lower cost
• Bizrate Insights
A reporting and ratings platform that captures
the power of the consumer voice.
• Display Media
An audience activation platform that integrates
retail data and programmatic buying.
6. Lessons Learned
“There’s a funny thing about regret... It’s better to regret
something you have done, than something you haven’t.” – Gibby
Haynes
6
7. A few of our production graduates
o Use of Cassandra
o SitePerf: in-house availability monitoring tool
o Several different customer-facing advertising products
o Hadoop implementations of core bidding platform
o Mock Service: Like Wiremock with persistence to MySQL
o Numerous internal tools for managing our systems
R & D
10% time: Give all engineers the opportunity to experiment
7
8. Quality Assurance
Any new technology choice should improve or maintain
test automation coverage
Case Study: Hadoop + Solr + BDD
8
9. Existing Technologies
Reasons to stay with an older technology
1. It works well
2. Your business depends on it
3. Your team is very knowledgeable in its operation
9
10. New Technologies
Reasons to use a new technology
1. It makes new things possible or very difficult things
easier
• Hadoop / MapReduce
• Auto-sharding distributed key-value data
stores (Cassandra, Hbase, VoltDB, Riak,
etc)
• Distributed stream-processing systems
(Storm)
10
11. New Technologies
Reasons to use a new technology
2. It will save your company
money
• Hardware
• Software Licensing
• Bandwidth
• Power Consumption
11
12. New Technologies
Reasons to use a new technology
3. It will save you time
• Time to market
• Time spent on operational complexity
• Time fighting fires
• Compute time
12
13. New Technologies
Reasons to use a new technology
4. It brings you in line with industry
standards
• Moving from home-grown frameworks
to Hadoop, Solr
• Where possible, running on JVM-based
systems
13
14. Big Data Trends
14
o Like you, our working dataset is only growing
o We are consolidating the number and variety of NoSQL solutions that we
use
o We’re looking at better abstractions for Java MapReduce programming:
Crunch, Cascading, …
o Have dipped our toes in the water with Storm, but expect heavier stream-
processing needs soon
o Still looking for a bulletproof way of importing data from various sources into
Hadoop: LinkedIn’s Gobblin shows some promise there
Notes de l'éditeur
This property has been true of most big-data technologies we’ve worked with
Especially open source ones
Any technology that represents a step back in testability should give you a horrible icky feeling
This example is Cucumber’s Gherkin DSL
Executes with every build
Runs against MiniMRCluster, starts a real Solr instance, executes all the real code in integration