Bigtable Row Filters#
Warning
gRPC is required for using the Cloud Bigtable API. As of May 2016,
grpcio is only supported in Python 2.7, so importing
gcloud.bigtable in other versions of Python will fail.
It is possible to use a
RowFilter
when adding mutations to a
ConditionalRow and when
reading row data with read_row()
read_rows().
As laid out in the RowFilter definition, the following basic filters are provided:
SinkFilterPassAllFilterBlockAllFilterRowKeyRegexFilterRowSampleFilterFamilyNameRegexFilterColumnQualifierRegexFilterTimestampRangeFilterColumnRangeFilterValueRegexFilterValueRangeFilterCellsRowOffsetFilterCellsRowLimitFilterCellsColumnLimitFilterStripValueTransformerFilterApplyLabelFilter
In addition, these filters can be combined into composite filters with
These rules can be nested arbitrarily, with a basic filter at the lowest level. For example:
# Filter in a specified column (matching any column family).
col1_filter = ColumnQualifierRegexFilter(b'columnbia')
# Create a filter to label results.
label1 = u'label-red'
label1_filter = ApplyLabelFilter(label1)
# Combine the filters to label all the cells in columnbia.
chain1 = RowFilterChain(filters=[col1_filter, label1_filter])
# Create a similar filter to label cells blue.
col2_filter = ColumnQualifierRegexFilter(b'columnseeya')
label2 = u'label-blue'
label2_filter = ApplyLabelFilter(label2)
chain2 = RowFilterChain(filters=[col2_filter, label2_filter])
# Bring our two labeled columns together.
row_filter = RowFilterUnion(filters=[chain1, chain2])
Filters for Google Cloud Bigtable Row classes.
-
class
gcloud.bigtable.row_filters.ApplyLabelFilter(label)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterFilter to apply labels to cells.
Intended to be used as an intermediate filter on a pre-existing filtered result set. This way if two sets are combined, the label can tell where the cell(s) originated.This allows the client to determine which results were produced from which part of the filter.
Note
Due to a technical limitation of the backend, it is not currently possible to apply multiple labels to a cell.
Parameters: label (str) – Label to apply to cells in the output row. Values must be at most 15 characters long, and match the pattern [a-z0-9\-]+.
-
class
gcloud.bigtable.row_filters.BlockAllFilter(flag)[source]# Bases:
gcloud.bigtable.row_filters._BoolFilterRow filter that doesn’t match any cells.
Parameters: flag (bool) – Does not match any cells, regardless of input. Useful for temporarily disabling just part of a filter.
-
class
gcloud.bigtable.row_filters.CellsColumnLimitFilter(num_cells)[source]# Bases:
gcloud.bigtable.row_filters._CellCountFilterRow filter to limit cells in a column.
Parameters: num_cells (int) – Matches only the most recent N cells within each column. This filters a (family name, column) pair, based on timestamps of each cell.
-
class
gcloud.bigtable.row_filters.CellsRowLimitFilter(num_cells)[source]# Bases:
gcloud.bigtable.row_filters._CellCountFilterRow filter to limit cells in a row.
Parameters: num_cells (int) – Matches only the first N cells of the row.
-
class
gcloud.bigtable.row_filters.CellsRowOffsetFilter(num_cells)[source]# Bases:
gcloud.bigtable.row_filters._CellCountFilterRow filter to skip cells in a row.
Parameters: num_cells (int) – Skips the first N cells of the row.
-
class
gcloud.bigtable.row_filters.ColumnQualifierRegexFilter(regex)[source]# Bases:
gcloud.bigtable.row_filters._RegexFilterRow filter for a column qualifier regular expression.
The
regexmust be valid RE2 patterns. See Google’s RE2 reference for the accepted syntax.Note
Special care need be used with the expression used. Since each of these properties can contain arbitrary bytes, the
\Cescape sequence must be used if a true wildcard is desired. The.character will not match the new line character\n, which may be present in a binary value.Parameters: regex (bytes) – A regular expression (RE2) to match cells from column that match this regex (irrespective of column family).
-
class
gcloud.bigtable.row_filters.ColumnRangeFilter(column_family_id, start_column=None, end_column=None, inclusive_start=None, inclusive_end=None)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterA row filter to restrict to a range of columns.
Both the start and end column can be included or excluded in the range. By default, we include them both, but this can be changed with optional flags.
Parameters: - column_family_id (str) – The column family that contains the columns. Must
be of the form
[_a-zA-Z0-9][-_.a-zA-Z0-9]*. - start_column (bytes) – The start of the range of columns. If no value is used, the backend applies no upper bound to the values.
- end_column (bytes) – The end of the range of columns. If no value is used, the backend applies no upper bound to the values.
- inclusive_start (bool) – Boolean indicating if the start column should be
included in the range (or excluded). Defaults
to
Trueifstart_columnis passed and noinclusive_startwas given. - inclusive_end (bool) – Boolean indicating if the end column should be
included in the range (or excluded). Defaults
to
Trueifend_columnis passed and noinclusive_endwas given.
Raises: ValueErrorifinclusive_startis set but nostart_columnis given or ifinclusive_endis set but noend_columnis given- column_family_id (str) – The column family that contains the columns. Must
be of the form
-
class
gcloud.bigtable.row_filters.ConditionalRowFilter(base_filter, true_filter=None, false_filter=None)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterConditional row filter which exhibits ternary behavior.
Executes one of two filters based on another filter. If the
base_filterreturns any cells in the row, thentrue_filteris executed. If not, thenfalse_filteris executed.Note
The
base_filterdoes not execute atomically with the true and false filters, which may lead to inconsistent or unexpected results.Additionally, executing a
ConditionalRowFilterhas poor performance on the server, especially whenfalse_filteris set.Parameters: - base_filter (
RowFilter) – The filter to condition on before executing the true/false filters. - true_filter (
RowFilter) – (Optional) The filter to execute if there are any cells matchingbase_filter. If not provided, no results will be returned in the true case. - false_filter (
RowFilter) – (Optional) The filter to execute if there are no cells matchingbase_filter. If not provided, no results will be returned in the false case.
- base_filter (
-
class
gcloud.bigtable.row_filters.FamilyNameRegexFilter(regex)[source]# Bases:
gcloud.bigtable.row_filters._RegexFilterRow filter for a family name regular expression.
The
regexmust be valid RE2 patterns. See Google’s RE2 reference for the accepted syntax.Parameters: regex (str) – A regular expression (RE2) to match cells from columns in a given column family. For technical reasons, the regex must not contain the ':'character, even if it is not being used as a literal.
-
class
gcloud.bigtable.row_filters.PassAllFilter(flag)[source]# Bases:
gcloud.bigtable.row_filters._BoolFilterRow filter equivalent to not filtering at all.
Parameters: flag (bool) – Matches all cells, regardless of input. Functionally equivalent to leaving filterunset, but included for completeness.
-
class
gcloud.bigtable.row_filters.RowFilter[source]# Bases:
objectBasic filter to apply to cells in a row.
These values can be combined via
RowFilterChain,RowFilterUnionandConditionalRowFilter.Note
This class is a do-nothing base class for all row filters.
-
class
gcloud.bigtable.row_filters.RowFilterChain(filters=None)[source]# Bases:
gcloud.bigtable.row_filters._FilterCombinationChain of row filters.
Sends rows through several filters in sequence. The filters are “chained” together to process a row. After the first filter is applied, the second is applied to the filtered output and so on for subsequent filters.
Parameters: filters (list) – List of RowFilter
-
class
gcloud.bigtable.row_filters.RowFilterUnion(filters=None)[source]# Bases:
gcloud.bigtable.row_filters._FilterCombinationUnion of row filters.
Sends rows through several filters simultaneously, then merges / interleaves all the filtered results together.
If multiple cells are produced with the same column and timestamp, they will all appear in the output row in an unspecified mutual order.
Parameters: filters (list) – List of RowFilter
-
class
gcloud.bigtable.row_filters.RowKeyRegexFilter(regex)[source]# Bases:
gcloud.bigtable.row_filters._RegexFilterRow filter for a row key regular expression.
The
regexmust be valid RE2 patterns. See Google’s RE2 reference for the accepted syntax.Note
Special care need be used with the expression used. Since each of these properties can contain arbitrary bytes, the
\Cescape sequence must be used if a true wildcard is desired. The.character will not match the new line character\n, which may be present in a binary value.Parameters: regex (bytes) – A regular expression (RE2) to match cells from rows with row keys that satisfy this regex. For a CheckAndMutateRowRequest, this filter is unnecessary since the row key is already specified.
-
class
gcloud.bigtable.row_filters.RowSampleFilter(sample)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterMatches all cells from a row with probability p.
Parameters: sample (float) – The probability of matching a cell (must be in the interval [0, 1]).
-
class
gcloud.bigtable.row_filters.SinkFilter(flag)[source]# Bases:
gcloud.bigtable.row_filters._BoolFilterAdvanced row filter to skip parent filters.
Parameters: flag (bool) – ADVANCED USE ONLY. Hook for introspection into the row filter. Outputs all cells directly to the output of the read rather than to any parent filter. Cannot be used within the predicate_filter,true_filter, orfalse_filterof aConditionalRowFilter.
-
class
gcloud.bigtable.row_filters.StripValueTransformerFilter(flag)[source]# Bases:
gcloud.bigtable.row_filters._BoolFilterRow filter that transforms cells into empty string (0 bytes).
Parameters: flag (bool) – If True, replaces each cell’s value with the empty string. As the name indicates, this is more useful as a transformer than a generic query / filter.
-
class
gcloud.bigtable.row_filters.TimestampRange(start=None, end=None)[source]# Bases:
objectRange of time with inclusive lower and exclusive upper bounds.
Parameters: - start (
datetime.datetime) – (Optional) The (inclusive) lower bound of the timestamp range. If omitted, defaults to Unix epoch. - end (
datetime.datetime) – (Optional) The (exclusive) upper bound of the timestamp range. If omitted, no upper bound is used.
-
to_pb()[source]# Converts the
TimestampRangeto a protobuf.Return type: data_v2_pb2.TimestampRangeReturns: The converted current object.
- start (
-
class
gcloud.bigtable.row_filters.TimestampRangeFilter(range_)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterRow filter that limits cells to a range of time.
Parameters: range ( TimestampRange) – Range of time that cells should match against.
-
class
gcloud.bigtable.row_filters.ValueRangeFilter(start_value=None, end_value=None, inclusive_start=None, inclusive_end=None)[source]# Bases:
gcloud.bigtable.row_filters.RowFilterA range of values to restrict to in a row filter.
Will only match cells that have values in this range.
Both the start and end value can be included or excluded in the range. By default, we include them both, but this can be changed with optional flags.
Parameters: - start_value (bytes) – The start of the range of values. If no value is used, the backend applies no lower bound to the values.
- end_value (bytes) – The end of the range of values. If no value is used, the backend applies no upper bound to the values.
- inclusive_start (bool) – Boolean indicating if the start value should be
included in the range (or excluded). Defaults
to
Trueifstart_valueis passed and noinclusive_startwas given. - inclusive_end (bool) – Boolean indicating if the end value should be
included in the range (or excluded). Defaults
to
Trueifend_valueis passed and noinclusive_endwas given.
Raises: ValueErrorifinclusive_startis set but nostart_valueis given or ifinclusive_endis set but noend_valueis given
-
class
gcloud.bigtable.row_filters.ValueRegexFilter(regex)[source]# Bases:
gcloud.bigtable.row_filters._RegexFilterRow filter for a value regular expression.
The
regexmust be valid RE2 patterns. See Google’s RE2 reference for the accepted syntax.Note
Special care need be used with the expression used. Since each of these properties can contain arbitrary bytes, the
\Cescape sequence must be used if a true wildcard is desired. The.character will not match the new line character\n, which may be present in a binary value.Parameters: regex (bytes) – A regular expression (RE2) to match cells with values that match this regex.