Using the API
=============

Authentication / Configuration
------------------------------

- Use :class:`Client <google.cloud.bigquery.client.Client>` objects to configure
  your applications.

- :class:`Client <google.cloud.bigquery.client.Client>` objects hold both a ``project``
  and an authenticated connection to the BigQuery service.

- The authentication credentials can be implicitly determined from the
  environment or directly via
  :meth:`from_service_account_json <google.cloud.bigquery.client.Client.from_service_account_json>`
  and
  :meth:`from_service_account_p12 <google.cloud.bigquery.client.Client.from_service_account_p12>`.

- After setting :envvar:`GOOGLE_APPLICATION_CREDENTIALS` and
  :envvar:`GOOGLE_CLOUD_PROJECT` environment variables, create an instance of
  :class:`Client <google.cloud.bigquery.client.Client>`.

  .. doctest::

     >>> from google.cloud import bigquery
     >>> client = bigquery.Client()


Projects
--------

A project is the top-level container in the ``BigQuery`` API:  it is tied
closely to billing, and can provide default access control across all its
datasets.  If no ``project`` is passed to the client container, the library
attempts to infer a project using the environment (including explicit
environment variables, GAE, and GCE).

To override the project inferred from the environment, pass an explicit
``project`` to the constructor, or to either of the alternative
``classmethod`` factories:

  .. doctest::

     >>> from google.cloud import bigquery
     >>> client = bigquery.Client(project='PROJECT_ID')


Project ACLs
~~~~~~~~~~~~

Each project has an access control list granting reader / writer / owner
permission to one or more entities.  This list cannot be queried or set
via the API:  it must be managed using the Google Developer Console.


Datasets
--------

A dataset represents a collection of tables, and applies several default
policies to tables as they are created:

- An access control list (ACL).  When created, a dataset has an ACL
  which maps to the ACL inherited from its project.

- A default table expiration period.  If set, tables created within the
  dataset will have the value as their expiration period.


Dataset operations
~~~~~~~~~~~~~~~~~~

List datasets for the client's project:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START client_list_datasets]
   :end-before: [END client_list_datasets]

Create a new dataset for the client's project:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_create]
   :end-before: [END dataset_create]

Check for the existence of a dataset:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_exists]
   :end-before: [END dataset_exists]

Refresh metadata for a dataset (to pick up changes made by another client):

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_reload]
   :end-before: [END dataset_reload]

Patch metadata for a dataset:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_patch]
   :end-before: [END dataset_patch]

Replace the ACL for a dataset, and update all writeable fields:

.. doctest::

   >>> from google.cloud import bigquery
   >>> client = bigquery.Client()
   >>> dataset = client.dataset('dataset_name')
   >>> dataset.get()  # API request
   >>> acl = list(dataset.acl)
   >>> acl.append(bigquery.Access(role='READER', entity_type='domain', entity='example.com'))
   >>> dataset.acl = acl
   >>> dataset.update()  # API request

Delete a dataset:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_delete]
   :end-before: [END dataset_delete]


Tables
------

Tables exist within datasets.  List tables for the dataset:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START dataset_list_tables]
   :end-before: [END dataset_list_tables]

Create a table:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_create]
   :end-before: [END table_create]

Check for the existence of a table:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_exists]
   :end-before: [END table_exists]

Refresh metadata for a table (to pick up changes made by another client):

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_reload]
   :end-before: [END table_reload]

Patch specific properties for a table:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_patch]
   :end-before: [END table_patch]

Update all writable metadata for a table

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_update]
   :end-before: [END table_update]

Get rows from a table's data:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_fetch_data]
   :end-before: [END table_fetch_data]

Insert rows into a table's data:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_insert_data]
   :end-before: [END table_insert_data]

Upload table data from a file:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_upload_from_file]
   :end-before: [END table_upload_from_file]

Delete a table:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START table_delete]
   :end-before: [END table_delete]


Jobs
----

Jobs describe actions peformed on data in BigQuery tables:

- Load data into a table
- Run a query against data in one or more tables
- Extract data from a table
- Copy a table

List jobs for a project:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START client_list_jobs]
   :end-before: [END client_list_jobs]


Querying data (synchronous)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run a query which can be expected to complete within bounded time:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START client_run_sync_query]
   :end-before: [END client_run_sync_query]

If the rows returned by the query do not fit into the inital response,
then we need to fetch the remaining rows via ``fetch_data``:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START client_run_sync_query_paged]
   :end-before: [END client_run_sync_query_paged]

If the query takes longer than the timeout allowed, ``query.complete``
will be ``False``.  In that case, we need to poll the associated job until
it is done, and then fetch the reuslts:

.. literalinclude:: bigquery_snippets.py
   :start-after: [START client_run_sync_query_timeout]
   :end-before: [END client_run_sync_query_timeout]


Querying data (asynchronous)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Background a query, loading the results into a table:

.. doctest::

   >>> from google.cloud import bigquery
   >>> client = bigquery.Client()
   >>> query = """\
   SELECT firstname + ' ' + last_name AS full_name,
          FLOOR(DATEDIFF(CURRENT_DATE(), birth_date) / 365) AS age
    FROM dataset_name.persons
   """
   >>> dataset = client.dataset('dataset_name')
   >>> table = dataset.table(name='person_ages')
   >>> job = client.run_async_query('fullname-age-query-job', query)
   >>> job.destination = table
   >>> job.write_disposition= 'truncate'
   >>> job.name
   'fullname-age-query-job'
   >>> job.job_type
   'query'
   >>> job.created
   None
   >>> job.state
   None

.. note::

   - ``google.cloud.bigquery`` generates a UUID for each job.
   - The ``created`` and ``state`` fields are not set until the job
     is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

   >>> job.submit()  # API call
   >>> job.created
   datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
   >>> job.state
   'RUNNING'

Poll until the job is complete:

.. doctest::

   >>> import time
   >>> retry_count = 100
   >>> while retry_count > 0 and job.state != 'DONE':
   ...     retry_count -= 1
   ...     time.sleep(10)
   ...     job.reload()  # API call
   >>> job.state
   'done'
   >>> job.ended
   datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)


Inserting data (asynchronous)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Start a job loading data asynchronously from a set of CSV files, located on
Google Cloud Storage, appending rows into an existing table.  First, create
the job locally:

.. doctest::

   >>> from google.cloud import bigquery
   >>> from google.cloud.bigquery import SchemaField
   >>> client = bigquery.Client()
   >>> table = dataset.table(name='person_ages')
   >>> table.schema = [
   ...     SchemaField('full_name', 'STRING', mode='required'),
   ...     SchemaField('age', 'INTEGER', mode='required)]
   >>> job = client.load_table_from_storage(
   ...     'load-from-storage-job', table, 'gs://bucket-name/object-prefix*')
   >>> job.source_format = 'CSV'
   >>> job.skip_leading_rows = 1  # count of skipped header rows
   >>> job.write_disposition = 'truncate'
   >>> job.name
   'load-from-storage-job'
   >>> job.job_type
   'load'
   >>> job.created
   None
   >>> job.state
   None

.. note::

   - ``google.cloud.bigquery`` generates a UUID for each job.
   - The ``created`` and ``state`` fields are not set until the job
     is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

   >>> job.begin()  # API call
   >>> job.created
   datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
   >>> job.state
   'RUNNING'

Poll until the job is complete:

.. doctest::

   >>> import time
   >>> retry_count = 100
   >>> while retry_count > 0 and job.state != 'DONE':
   ...     retry_count -= 1
   ...     time.sleep(10)
   ...     job.reload()  # API call
   >>> job.state
   'done'
   >>> job.ended
   datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)


Exporting data (async)
~~~~~~~~~~~~~~~~~~~~~~

Start a job exporting a table's data asynchronously to a set of CSV files,
located on Google Cloud Storage.  First, create the job locally:

.. doctest::

   >>> from google.cloud import bigquery
   >>> client = bigquery.Client()
   >>> table = dataset.table(name='person_ages')
   >>> job = client.extract_table_to_storage(
   ...     'extract-person-ages-job', table,
   ...     'gs://bucket-name/export-prefix*.csv')
   ... job.destination_format = 'CSV'
   ... job.print_header = True
   ... job.write_disposition = 'truncate'
   >>> job.name
   'extract-person-ages-job'
   >>> job.job_type
   'extract'
   >>> job.created
   None
   >>> job.state
   None

.. note::

   - ``google.cloud.bigquery`` generates a UUID for each job.
   - The ``created`` and ``state`` fields are not set until the job
     is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

   >>> job.begin()  # API call
   >>> job.created
   datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
   >>> job.state
   'RUNNING'

Poll until the job is complete:

.. doctest::

   >>> import time
   >>> retry_count = 100
   >>> while retry_count > 0 and job.state != 'DONE':
   ...     retry_count -= 1
   ...     time.sleep(10)
   ...     job.reload()  # API call
   >>> job.state
   'done'
   >>> job.ended
   datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)


Copy tables (async)
~~~~~~~~~~~~~~~~~~~

First, create the job locally:

.. doctest::

   >>> from google.cloud import bigquery
   >>> client = bigquery.Client()
   >>> source_table = dataset.table(name='person_ages')
   >>> destination_table = dataset.table(name='person_ages_copy')
   >>> job = client.copy_table(
   ...     'copy-table-job', destination_table, source_table)
   >>> job.name
   'copy-table-job'
   >>> job.job_type
   'copy'
   >>> job.created
   None
   >>> job.state
   None

.. note::

   - ``google.cloud.bigquery`` generates a UUID for each job.
   - The ``created`` and ``state`` fields are not set until the job
     is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

   >>> job.begin()  # API call
   >>> job.created
   datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
   >>> job.state
   'RUNNING'

Poll until the job is complete:

.. doctest::

   >>> import time
   >>> retry_count = 100
   >>> while retry_count > 0 and job.state != 'DONE':
   ...     retry_count -= 1
   ...     time.sleep(10)
   ...     job.reload()  # API call
   >>> job.state
   'done'
   >>> job.ended
   datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)