SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
HBaseCon, May 2012

HBase Filters
Lars George, Solutions Architect
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




2                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
About Me

    •  Solutions Architect @ Cloudera
    •  Apache HBase & Whirr Committer
    •  Author of
           HBase – The Definitive Guide
    •  Working with HBase since end
       of 2007
    •  Organizer of the Munich OpenHUG
    •  Speaker at Conferences (Fosdem,
       Hadoop World)

3               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
Introduction to Filters

    •  Used in combination with get() and scan()
       API calls
    •  Steps:
      –  Create Filter instance
      –  Create Get or Scan instance
      –  Assign Filter to Get or Scan
      –  Call API and enjoy
    •  More fine-grained control over what is
       returned to the client

4                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Features

    •  Allow client to further narrow down what is
       retrieved
      –  Not just per row or column key, or per column
         family
    •  Predicate Pushdown
      –  Move filtering from client to server to reduce
         network traffic
    •  Varying performance implications,
       dependent on the use-case


5                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Pushdown




6             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Features (cont.)

    •  Filters have access to the entire row to
       decide its fate
      –  Access to KeyValue instances to check row keys,
         column qualifiers, timestamps, or values
    •  Scan batching might conflict with the above
       and might trigger an “Incompatible Filter”
       exception
      –  Example: DependentColumnFilter
    •  There is no cross invocation state
      –  Cannot filter rows based on dependent rows


7                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Filters

    •  Many filters are supplied by HBase
      –  Based on row key, column family, or column
         qualifier
      –  Paging through rows and columns
      –  Based on dependencies

    •  Write your own filters
      –  Use FilterBase class to get a no-op
         skeleton and fill in the gaps


8                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




9                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Comparison Filters

 •  Based on CompareFilter class
 •  Adds the compare() method to
    FilterBase!
 •  Takes operator that defines how the
    comparison is performed
     –  Predefined by client API
 •  Also needs a comparator to do the actual
    check
     –  HBase supplies a large set

10                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Operators




11        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparators




12        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  Not all combinations of operator and
    comparator make sense
     –  For example, the SubstringComparator
        replies only 0 (match) and 1(no match)
     –  Only EQUAL and NOT_EQUAL are useful
     –  Using other operators is allowed but will most
        likely yield unexpected results




13                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  HBase filters are usually filtering data out
 •  Comparison filters work in reverse as they
    include matching data
     –  Be mindful when selecting the comparison
        operator!




14               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Available Comparison Filters

 •  Row Filter
     –  Based on row keys comparisons
 •  Family Filter
     –  Based on column family names
 •  Qualifier Filter
     –  Based on column names, aka qualifiers
 •  Value Filter
     –  Based on the actual value of a column


15                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Comparison Filters (cont.)

 •  Dependent Column Filter
     –  Based on a timestamp of a reference column
     –  Includes all columns that have the same
        timestamp
     –  Implies that the entire row is accessible, since
        batching will not have access to the reference
        column
        •  No scanner batching allowed!
     –  Useful for loading interdependent changes
        within a row


16                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Example Code
Scan scan = new Scan();

scan.addColumn(Bytes.toBytes("colfam1"), !
  Bytes.toBytes("col-0")); !
Filter filter = new RowFilter(!
  CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-22")));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result res : scanner) { !
  System.out.println(res); !
} !
scanner.close(); !
!

17            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Example Ouput
 keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} !
 keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} !
 keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} !
 keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} !
 keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} !
 keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} !
 keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} !
 keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} !
 keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} !
 keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} !
 keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} !
 keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} !
 keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} !
 keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} !
 keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} !
 keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !



18                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




19                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Dedicated Filters

 •  Based directly on FilterBase class
 •  Often less useful for get() calls, since
    entire rows are filtered




20             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                    or redistribution without written permission is prohibited.
Available Dedicated Filters

 •  Single Column Value Filter
     –  Filter rows based on one specific column
     –  Extra features
       •  “Filter if missing”
       •  “Get latest version only”
     –  Column must be part of the scan selection
       •  Or else it is all or nothing
     –  Also needs compare operation and an
        optional comparator


21                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Single Column Value Exclude Filter
     –  Same as the one before but excludes the
        selection column
 •  Prefix Filter
     –  Based on prefix of row keys
     –  Can early out the scan!
       •  Combine with start row for best performance




22                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)
 •  Page Filter
     –  Allows pagination through rows
     –  Needs to be combined with setting the start row on
        subsequent scans
     –  Can early out the scan when limit is reached
 •  Key Only Filter
     –  Drop the value for every column
 •  First Key Only Filter
     –  Return only the first column key
     –  Useful for row counter, or get newest post type
        applications
     –  Can early out rest of row scan


23                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Inclusive Stop Filter
     –  As opposed to the exclusive stop row, this
        filter will include the final row
 •  Timestamp Filter
     –  Takes list of timestamps to include in result
 •  Column Count Get Filter
     –  Used to limit number of columns returned by a
        get() call


24                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Column Pagination Filter
     –  Allows to paginate through columns within a
        row
     –  Skips to offset parameter and returns
        limit columns
 •  Column Prefix Filter
     –  Analog to PrefixFilter, here for matching
        column qualifiers
 •  Random Row Filter

25               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




26                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Decorating Filters

 •  Extend filters to gain additional control
    over the returned data
 •  Skip Filter
     –  Skip entire row when a column is filtered
     –  Not all filters are compatible
 •  While Match Filter
     –  Aborts entire scan once the wrapped filter
        indicates a row or column is omitted


27                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




28                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Combining Filters

 •  Implemented by the FilterList class
     –  Wraps list of filters into a Filter compatible
        class
     –  Takes optional operator to decide how to
        handle the results of each wrapped filter
        (default: MUST_PASS_ALL)




29                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Combining Filters

 •  Filter lists can contain other filter lists
 •  Operator is fixed per list, but hierarchy
    allows to create combinations
 •  Using the proper List implementation
    helps controlling filter execution order




30              ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
List<Filter> filters = new ArrayList<Filter>();

 Filter filter1 = new RowFilter(!
    CompareFilter.CompareOp.GREATER_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-03"))); !
 filters.add(filter1); !
 Filter filter2 = new RowFilter(!
    CompareFilter.CompareOp.LESS_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-06"))); !
 filters.add(filter2); !
 Filter filter3 = new QualifierFilter(!
    CompareFilter.CompareOp.EQUAL, !
    new RegexStringComparator("col-0[03]")); !
 filters.add(filter3);!
 FilterList filterList1 = new FilterList(filters); !
 …!
 FilterList filterList2 = new
 FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !


31                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




32                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Custom Filter

 •  Allows users to add missing filters
 •  Either implement Filter interface or use
    FilterBase skeleton
 •  Provides hooks called at different stages
    of the read process




33            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Interface
 public interface Filter extends Writable { !
   public enum ReturnCode { !
     INCLUDE, SKIP, NEXT_COL, NEXT_ROW,!
     SEEK_NEXT_USING_HINT } !
   public void reset()!
   public boolean filterRowKey(byte[] buffer, !
     int offset, int length) !
   public boolean filterAllRemaining()!
   public ReturnCode filterKeyValue(KeyValue v)!
   public void filterRow(List<KeyValue> kvs)!
   public boolean hasFilterRow()!
   public boolean filterRow()!
   public KeyValue getNextKeyHint(KeyValue !
     currentKV) !
 !


34               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Filter Return Codes




35          ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                 or redistribution without written permission is prohibited.
Merge Reads




36        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Filter Flow

 •  Filter hooks are called at
    different stages
 •  Seeks are done initially to
    find the next KeyValue
     –  Hint from previous filter
        invocation might help
 •  Early out checks improve
    performance


37      ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
             or redistribution without written permission is prohibited.
Example Code
public class CustomFilter extends FilterBase{ !
  private byte[] value = null; !
  private boolean filterRow = true; !
  public CustomFilter() { super(); }!
  public CustomFilter(byte[] value) { this.value = value; } !
  @Override

  public void reset() { this.filterRow = true; } !
  @Override !
  public ReturnCode filterKeyValue(KeyValue kv) {!
    if (Bytes.compareTo(value, kv.getValue()) == 0) { !
       filterRow = false; !
    } !
    return ReturnCode.INCLUDE; !
  } !
  @Override !
  public boolean filterRow() { return filterRow; } !
  ...!
} !
!
38                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Deploying Custom Filters

 •    Need to provide JAR file with filter class
 •    Deploy JAR to RegionServers
 •    Add JAR to HBASE_CLASSPATH
 •    Restart RegionServers

 •  Tip: Testing on cluster more involved, test
    on local machine first


39                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Summary




40         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.
Summary (cont.)




41         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.

Contenu connexe

Tendances

Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 

Tendances (20)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

En vedette

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model abodeltae
 
Git branching-model
Git branching-modelGit branching-model
Git branching-modelAaron Huang
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git RightSven Peters
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 

En vedette (20)

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model
 
Git branching-model
Git branching-modelGit branching-model
Git branching-model
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 

Similaire à HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentGeorgi Kodinov
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlChris Muir
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeGeorgi Kodinov
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modulesheyrocker
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideCourtney Llamas
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMark Swarbrick
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureGeorgi Kodinov
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideCourtney Llamas
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of AtelierMichelle Stolwyk
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for DevelopmentBIOVIA
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17Smita B Kumar
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZFRalph Schindler
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache KafkaJason Hubbard
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerSimon Haslam
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-RewriteMarakana Inc.
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated TestingMorgan Tocker
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 

Similaire à HBaseCon 2012 | HBase Filtering - Lars George, Cloudera (20)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version Control
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modules
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners Guide
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component Infrastructure
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners Guide
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of Atelier
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZF
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack Manager
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-Rewrite
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

  • 1. HBaseCon, May 2012 HBase Filters Lars George, Solutions Architect
  • 2. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me •  Solutions Architect @ Cloudera •  Apache HBase & Whirr Committer •  Author of HBase – The Definitive Guide •  Working with HBase since end of 2007 •  Organizer of the Munich OpenHUG •  Speaker at Conferences (Fosdem, Hadoop World) 3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Introduction to Filters •  Used in combination with get() and scan() API calls •  Steps: –  Create Filter instance –  Create Get or Scan instance –  Assign Filter to Get or Scan –  Call API and enjoy •  More fine-grained control over what is returned to the client 4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. Filter Features •  Allow client to further narrow down what is retrieved –  Not just per row or column key, or per column family •  Predicate Pushdown –  Move filtering from client to server to reduce network traffic •  Varying performance implications, dependent on the use-case 5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. Filter Pushdown 6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Filter Features (cont.) •  Filters have access to the entire row to decide its fate –  Access to KeyValue instances to check row keys, column qualifiers, timestamps, or values •  Scan batching might conflict with the above and might trigger an “Incompatible Filter” exception –  Example: DependentColumnFilter •  There is no cross invocation state –  Cannot filter rows based on dependent rows 7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Available Filters •  Many filters are supplied by HBase –  Based on row key, column family, or column qualifier –  Paging through rows and columns –  Based on dependencies •  Write your own filters –  Use FilterBase class to get a no-op skeleton and fill in the gaps 8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Comparison Filters •  Based on CompareFilter class •  Adds the compare() method to FilterBase! •  Takes operator that defines how the comparison is performed –  Predefined by client API •  Also needs a comparator to do the actual check –  HBase supplies a large set 10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Comparison Operators 11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Comparators 12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Comparison Filters (cont.) •  Not all combinations of operator and comparator make sense –  For example, the SubstringComparator replies only 0 (match) and 1(no match) –  Only EQUAL and NOT_EQUAL are useful –  Using other operators is allowed but will most likely yield unexpected results 13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Comparison Filters (cont.) •  HBase filters are usually filtering data out •  Comparison filters work in reverse as they include matching data –  Be mindful when selecting the comparison operator! 14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Available Comparison Filters •  Row Filter –  Based on row keys comparisons •  Family Filter –  Based on column family names •  Qualifier Filter –  Based on column names, aka qualifiers •  Value Filter –  Based on the actual value of a column 15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Available Comparison Filters (cont.) •  Dependent Column Filter –  Based on a timestamp of a reference column –  Includes all columns that have the same timestamp –  Implies that the entire row is accessible, since batching will not have access to the reference column •  No scanner batching allowed! –  Useful for loading interdependent changes within a row 16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Example Code Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes("colfam1"), ! Bytes.toBytes("col-0")); ! Filter filter = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-22"))); scan.setFilter(filter);
 ResultScanner scanner = table.getScanner(scan); for (Result res : scanner) { ! System.out.println(res); ! } ! scanner.close(); ! ! 17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Example Ouput keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} ! keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} ! keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} ! keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} ! keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} ! keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} ! keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} ! keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} ! keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} ! keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} ! keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} ! keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} ! keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} ! keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} ! keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} ! keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} ! 18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Dedicated Filters •  Based directly on FilterBase class •  Often less useful for get() calls, since entire rows are filtered 20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Available Dedicated Filters •  Single Column Value Filter –  Filter rows based on one specific column –  Extra features •  “Filter if missing” •  “Get latest version only” –  Column must be part of the scan selection •  Or else it is all or nothing –  Also needs compare operation and an optional comparator 21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Available Dedicated Filters (cont.) •  Single Column Value Exclude Filter –  Same as the one before but excludes the selection column •  Prefix Filter –  Based on prefix of row keys –  Can early out the scan! •  Combine with start row for best performance 22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Available Dedicated Filters (cont.) •  Page Filter –  Allows pagination through rows –  Needs to be combined with setting the start row on subsequent scans –  Can early out the scan when limit is reached •  Key Only Filter –  Drop the value for every column •  First Key Only Filter –  Return only the first column key –  Useful for row counter, or get newest post type applications –  Can early out rest of row scan 23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Available Dedicated Filters (cont.) •  Inclusive Stop Filter –  As opposed to the exclusive stop row, this filter will include the final row •  Timestamp Filter –  Takes list of timestamps to include in result •  Column Count Get Filter –  Used to limit number of columns returned by a get() call 24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 25. Available Dedicated Filters (cont.) •  Column Pagination Filter –  Allows to paginate through columns within a row –  Skips to offset parameter and returns limit columns •  Column Prefix Filter –  Analog to PrefixFilter, here for matching column qualifiers •  Random Row Filter 25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 26. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 27. Decorating Filters •  Extend filters to gain additional control over the returned data •  Skip Filter –  Skip entire row when a column is filtered –  Not all filters are compatible •  While Match Filter –  Aborts entire scan once the wrapped filter indicates a row or column is omitted 27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 28. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 29. Combining Filters •  Implemented by the FilterList class –  Wraps list of filters into a Filter compatible class –  Takes optional operator to decide how to handle the results of each wrapped filter (default: MUST_PASS_ALL) 29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 30. Combining Filters •  Filter lists can contain other filter lists •  Operator is fixed per list, but hierarchy allows to create combinations •  Using the proper List implementation helps controlling filter execution order 30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 31. List<Filter> filters = new ArrayList<Filter>();
 Filter filter1 = new RowFilter(! CompareFilter.CompareOp.GREATER_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-03"))); ! filters.add(filter1); ! Filter filter2 = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-06"))); ! filters.add(filter2); ! Filter filter3 = new QualifierFilter(! CompareFilter.CompareOp.EQUAL, ! new RegexStringComparator("col-0[03]")); ! filters.add(filter3);! FilterList filterList1 = new FilterList(filters); ! …! FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters); ! 31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 32. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 33. Custom Filter •  Allows users to add missing filters •  Either implement Filter interface or use FilterBase skeleton •  Provides hooks called at different stages of the read process 33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 34. Filter Interface public interface Filter extends Writable { ! public enum ReturnCode { ! INCLUDE, SKIP, NEXT_COL, NEXT_ROW,! SEEK_NEXT_USING_HINT } ! public void reset()! public boolean filterRowKey(byte[] buffer, ! int offset, int length) ! public boolean filterAllRemaining()! public ReturnCode filterKeyValue(KeyValue v)! public void filterRow(List<KeyValue> kvs)! public boolean hasFilterRow()! public boolean filterRow()! public KeyValue getNextKeyHint(KeyValue ! currentKV) ! ! 34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 35. Filter Return Codes 35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 36. Merge Reads 36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 37. Filter Flow •  Filter hooks are called at different stages •  Seeks are done initially to find the next KeyValue –  Hint from previous filter invocation might help •  Early out checks improve performance 37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 38. Example Code public class CustomFilter extends FilterBase{ ! private byte[] value = null; ! private boolean filterRow = true; ! public CustomFilter() { super(); }! public CustomFilter(byte[] value) { this.value = value; } ! @Override
 public void reset() { this.filterRow = true; } ! @Override ! public ReturnCode filterKeyValue(KeyValue kv) {! if (Bytes.compareTo(value, kv.getValue()) == 0) { ! filterRow = false; ! } ! return ReturnCode.INCLUDE; ! } ! @Override ! public boolean filterRow() { return filterRow; } ! ...! } ! ! 38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 39. Deploying Custom Filters •  Need to provide JAR file with filter class •  Deploy JAR to RegionServers •  Add JAR to HBASE_CLASSPATH •  Restart RegionServers •  Tip: Testing on cluster more involved, test on local machine first 39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 40. Summary 40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 41. Summary (cont.) 41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.