Chevron Left
Back to ETL Processing on Google Cloud Using Dataflow and BigQuery

Learner Reviews & Feedback for ETL Processing on Google Cloud Using Dataflow and BigQuery by Google Cloud

About the Course

This is a self-paced lab that takes place in the Google Cloud console. In this lab you will build several Data Pipelines that will ingest data from a publicly available dataset into BigQuery....
Filter by:

1 - 1 of 1 Reviews for ETL Processing on Google Cloud Using Dataflow and BigQuery

By Blazej K

•

Aug 18, 2023

This lab is not working! Tied 4 times, all the time the same outcome.

Task 9 point 5 fails

root@0e180ce8348b:/dataflow# python dataflow_python_examples/data_ingestion.py \

--project=$PROJECT --region=us-central1 \

--runner=DataflowRunner \

--staging_location=gs://$PROJECT/test \

--temp_location gs://$PROJECT/test \

--input gs://$PROJECT/data_files/head_usa_names.csv \

--save_main_session

INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.

INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.

INFO:oauth2client.transport:Attempting refresh to obtain initial access_token

dataflow_python_examples/data_ingestion.py:128: BeamDeprecationWarning: BigQuerySink is deprecated since 2.11.0. Use WriteToBigQuery instead.

write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))

INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi

INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/tmpegkwwubh', 'apache-beam==2.24.0', '--no-deps', '--no-binary', ':all:']

[notice] A new release of pip is available: 23.0.1 -> 23.2.1

[notice] To update, run: pip install --upgrade pip

INFO:apache_beam.runners.portability.stager:Staging SDK sources from PyPI: dataflow_python_sdk.tar

INFO:apache_beam.runners.portability.stager:Downloading binary distribution of the SDK from PyPi

INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/tmpegkwwubh', 'apache-beam==2.24.0', '--no-deps', '--only-binary', ':all:', '--python-version', '37', '--implementation', 'cp', '--abi', 'cp37m', '--platform', 'manylinux1_x86_64']

[notice] A new release of pip is available: 23.0.1 -> 23.2.1

[notice] To update, run: pip install --upgrade pip

INFO:apache_beam.runners.portability.stager:Staging binary distribution of the SDK from PyPI: apache_beam-2.24.0-cp37-cp37m-manylinux1_x86_64.whl

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.

INFO:root:Using Python SDK docker image: apache/beam_python3.7_sdk:2.24.0. If the image is not available at local, we will try to pull from hub.docker.com

INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/pipeline.pb...

INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/pipeline.pb in 0 seconds.

INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/pickled_main_session...

INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/pickled_main_session in 0 seconds.

INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/dataflow_python_sdk.tar...

INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/dataflow_python_sdk.tar in 1 seconds.

INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/apache_beam-2.24.0-cp37-cp37m-manylinux1_x86_64.whl...

INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/apache_beam-2.24.0-cp37-cp37m-manylinux1_x86_64.whl in 9 seconds.

INFO:apache_beam.runners.dataflow.internal.apiclient:Create job: <Job

createTime: '2023-08-16T17:54:54.160761Z'

currentStateTime: '1970-01-01T00:00:00Z'

id: '2023-08-16_10_54_52-17789858048442316409'

location: 'us-central1'

name: 'beamapp-root-0816175438-753785'

projectId: 'qwiklabs-gcp-04-2e2a1a6a0771'

stageStates: []

startTime: '2023-08-16T17:54:54.160761Z'

steps: []

tempFiles: []

type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>

INFO:apache_beam.runners.dataflow.internal.apiclient:Created job with id: [2023-08-16_10_54_52-17789858048442316409]

INFO:apache_beam.runners.dataflow.internal.apiclient:Submitted job: 2023-08-16_10_54_52-17789858048442316409

INFO:apache_beam.runners.dataflow.internal.apiclient:To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobs/us-central1/2023-08-16_10_54_52-17789858048442316409?project=qwiklabs-gcp-04-2e2a1a6a0771

INFO:apache_beam.runners.dataflow.dataflow_runner:Job 2023-08-16_10_54_52-17789858048442316409 is in state JOB_STATE_PENDING

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:54.809Z: JOB_MESSAGE_DETAILED: Autoscaling is enabled for job 2023-08-16_10_54_52-17789858048442316409. The number of workers will be between 1 and 1000.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:54.902Z: JOB_MESSAGE_DETAILED: Autoscaling was automatically enabled for job 2023-08-16_10_54_52-17789858048442316409.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:57.584Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-1 in us-central1-f.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.162Z: JOB_MESSAGE_DETAILED: Expanding CoGroupByKey operations into optimizable parts.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.192Z: JOB_MESSAGE_DETAILED: Expanding GroupByKey operations into optimizable parts.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.213Z: JOB_MESSAGE_DETAILED: Lifting ValueCombiningMappingFns into MergeBucketsMappingFns

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.238Z: JOB_MESSAGE_DEBUG: Annotating graph with Autotuner information.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.277Z: JOB_MESSAGE_DETAILED: Fusing adjacent ParDo, Read, Write, and Flatten operations

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.309Z: JOB_MESSAGE_DETAILED: Fusing consumer String To BigQuery Row into Read from a File/Read

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.328Z: JOB_MESSAGE_DETAILED: Fusing consumer Write to BigQuery/NativeWrite into String To BigQuery Row

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.370Z: JOB_MESSAGE_DEBUG: Workflow config is missing a default resource spec.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.401Z: JOB_MESSAGE_DEBUG: Adding StepResource setup and teardown to workflow graph.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.424Z: JOB_MESSAGE_DEBUG: Adding workflow start and stop steps.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.453Z: JOB_MESSAGE_DEBUG: Assigning stage ids.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.592Z: JOB_MESSAGE_DEBUG: Executing wait step start3

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.648Z: JOB_MESSAGE_BASIC: Executing operation Read from a File/Read+String To BigQuery Row+Write to BigQuery/NativeWrite

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.688Z: JOB_MESSAGE_DEBUG: Starting worker pool setup.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:54:58.725Z: JOB_MESSAGE_BASIC: Starting 1 workers in us-central1-f...

INFO:apache_beam.runners.dataflow.dataflow_runner:Job 2023-08-16_10_54_52-17789858048442316409 is in state JOB_STATE_RUNNING

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.129Z: JOB_MESSAGE_ERROR: Startup of the worker pool in zone us-central1-f failed to bring up any of the desired 1 workers. Please refer to https://cloud.google.com/dataflow/docs/guides/common-errors#worker-pool-failure for help troubleshooting. ZONE_RESOURCE_POOL_EXHAUSTED: Instance 'beamapp-root-0816175438-7-08161054-fpie-harness-cwjb' creation failed: The zone 'projects/qwiklabs-gcp-04-2e2a1a6a0771/zones/us-central1-f' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.169Z: JOB_MESSAGE_ERROR: Workflow failed.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.221Z: JOB_MESSAGE_BASIC: Executing BigQuery import job "dataflow_job_6900608637656064275". You can check its status with the bq tool: "bq show -j --project_id=qwiklabs-gcp-04-2e2a1a6a0771 dataflow_job_6900608637656064275".

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.244Z: JOB_MESSAGE_WARNING: Unable to delete temp directory: "gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/6900608637656065782/dax-tmp-2023-08-16_10_54_52-17789858048442316409-S01-0-5564021760e6e6be". Causes: Unable to view metadata for files: gs://qwiklabs-gcp-04-2e2a1a6a0771/test/beamapp-root-0816175438-753785.1692208478.753997/6900608637656065782/dax-tmp-2023-08-16_10_54_52-17789858048442316409-S01-0-5564021760e6e6be.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.267Z: JOB_MESSAGE_WARNING: S01:Read from a File/Read+String To BigQuery Row+Write to BigQuery/NativeWrite failed.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.295Z: JOB_MESSAGE_BASIC: Finished operation Read from a File/Read+String To BigQuery Row+Write to BigQuery/NativeWrite

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.343Z: JOB_MESSAGE_DETAILED: Cleaning up.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.385Z: JOB_MESSAGE_DEBUG: Starting worker pool teardown.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:55:56.401Z: JOB_MESSAGE_BASIC: Stopping worker pool...

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:56:17.736Z: JOB_MESSAGE_BASIC: Worker pool stopped.

INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-16T17:56:17.763Z: JOB_MESSAGE_DEBUG: Tearing down pending resources...

INFO:apache_beam.runners.dataflow.dataflow_runner:Job 2023-08-16_10_54_52-17789858048442316409 is in state JOB_STATE_FAILED

Traceback (most recent call last):

File "dataflow_python_examples/data_ingestion.py", line 134, in <module>

run()

File "dataflow_python_examples/data_ingestion.py", line 129, in run

p.run().wait_until_finish()

File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1633, in wait_until_finish

self)

apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:

Workflow failed.