[Deprecated] TFX Metadata Reviews

2231 reviews

the cluster deployment is blocked at 64% and does not change any more. Current errors: [GCE_STOCKOUT]:

Amine N. · Reviewed about 1 year ago

AI Platform Pipelines is deprecated and is not functioning. Cannot deploy the pipeline package.

Kunal A. · Reviewed about 1 year ago

the cluster deployment is blocked at 64% and does not change any more.

Amine N. · Reviewed about 1 year ago

RAAGHAVAN K. · Reviewed about 1 year ago

Jesus Eduardo J. · Reviewed about 1 year ago

one more deprecated labs

Praveen C. · Reviewed about 1 year ago

The tuner failed so the next step all failed (tried to archive the run but give me the same error: 2024-09-03 08:33:37.709428: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lo ... --------------------------------------------------------------------------- _InactiveRpcError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 201 try: --> 202 response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec)) 203 except grpc.RpcError as e: /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression) 945 wait_for_ready, compression) --> 946 return _end_unary_response_blocking(state, call, False, None) 947 /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 848 else: --> 849 raise _InactiveRpcError(state) 850 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1725353041.309480180","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1725353041.309478959","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" > During handling of the above exception, another exception occurred: UnavailableError Traceback (most recent call last) /tmp/ipykernel_18688/3443186458.py in <module> ----> 1 for artifact_type in store.get_artifact_types(): 2 print(artifact_type.name) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in get_artifact_types(self) 685 response = metadata_store_service_pb2.GetArtifactTypesResponse() 686 --> 687 self._call('GetArtifactTypes', request, response) 688 result = [] 689 for x in response.artifact_types: /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response) 175 while True: 176 try: --> 177 return self._call_method(method_name, request, response) 178 except errors.AbortedError: 179 num_retries -= 1 /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 205 # description. 206 # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode --> 207 raise _make_exception(e.details(), e.code().value[0]) # pytype: disable=attribute-error 208 209 def _swig_call(self, method, request, response) -> None: UnavailableError: failed to connect to all addresses --------------------------------------------------------------------------- _InactiveRpcError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 201 try: --> 202 response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec)) 203 except grpc.RpcError as e: /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression) 945 wait_for_ready, compression) --> 946 return _end_unary_response_blocking(state, call, False, None) 947 /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 848 else: --> 849 raise _InactiveRpcError(state) 850 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1725353064.696169603","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1725353064.696167689","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" > During handling of the above exception, another exception occurred: UnavailableError Traceback (most recent call last) /tmp/ipykernel_18688/4236999377.py in <module> ----> 1 for execution_type in store.get_execution_types(): 2 print(execution_type.name) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in get_execution_types(self) 724 response = metadata_store_service_pb2.GetExecutionTypesResponse() 725 --> 726 self._call('GetExecutionTypes', request, response) 727 result = [] 728 for x in response.execution_types: /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response) 175 while True: 176 try: --> 177 return self._call_method(method_name, request, response) 178 except errors.AbortedError: 179 num_retries -= 1 /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 205 # description. 206 # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode --> 207 raise _make_exception(e.details(), e.code().value[0]) # pytype: disable=attribute-error 208 209 def _swig_call(self, method, request, response) -> None: UnavailableError: failed to connect to all addresses --------------------------------------------------------------------------- _InactiveRpcError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 201 try: --> 202 response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec)) 203 except grpc.RpcError as e: /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression) 945 wait_for_ready, compression) --> 946 return _end_unary_response_blocking(state, call, False, None) 947 /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 848 else: --> 849 raise _InactiveRpcError(state) 850 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1725353069.086763692","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1725353069.086761886","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" > During handling of the above exception, another exception occurred: UnavailableError Traceback (most recent call last) /tmp/ipykernel_18688/768924533.py in <module> ----> 1 for context_type in store.get_context_types(): 2 print(context_type.name) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in get_context_types(self) 763 response = metadata_store_service_pb2.GetContextTypesResponse() 764 --> 765 self._call('GetContextTypes', request, response) 766 result = [] 767 for x in response.context_types: /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response) 175 while True: 176 try: --> 177 return self._call_method(method_name, request, response) 178 except errors.AbortedError: 179 num_retries -= 1 /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 205 # description. 206 # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode --> 207 raise _make_exception(e.details(), e.code().value[0]) # pytype: disable=attribute-error 208 209 def _swig_call(self, method, request, response) -> None: UnavailableError: failed to connect to all addresses --------------------------------------------------------------------------- _InactiveRpcError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 201 try: --> 202 response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec)) 203 except grpc.RpcError as e: /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression) 945 wait_for_ready, compression) --> 946 return _end_unary_response_blocking(state, call, False, None) 947 /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 848 else: --> 849 raise _InactiveRpcError(state) 850 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1725353183.044450646","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1725353183.044449468","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" > During handling of the above exception, another exception occurred: UnavailableError Traceback (most recent call last) /tmp/ipykernel_18688/1900152474.py in <module> 1 with metadata.Metadata(connection_config) as store: ----> 2 schema_artifacts = store.get_artifacts_by_type(standard_artifacts.Schema.TYPE_NAME) 3 stats_artifacts = store.get_artifacts_by_type(standard_artifacts.ExampleStatistics.TYPE_NAME) 4 anomalies_artifacts = store.get_artifacts_by_type(standard_artifacts.ExampleAnomalies.TYPE_NAME) /opt/conda/lib/python3.7/site-packages/tfx/orchestration/metadata.py in get_artifacts_by_type(self, type_name) 250 self, type_name: Text) -> List[metadata_store_pb2.Artifact]: 251 """Fetches artifacts given artifact type name.""" --> 252 return self.store.get_artifacts_by_type(type_name) 253 254 # TODO(b/145751019): Remove this once migrated to use MLMD built-in states. /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in get_artifacts_by_type(self, type_name) 585 response = metadata_store_service_pb2.GetArtifactsByTypeResponse() 586 --> 587 self._call('GetArtifactsByType', request, response) 588 result = [] 589 for x in response.artifacts: /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response) 175 while True: 176 try: --> 177 return self._call_method(method_name, request, response) 178 except errors.AbortedError: 179 num_retries -= 1 /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 205 # description. 206 # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode --> 207 raise _make_exception(e.details(), e.code().value[0]) # pytype: disable=attribute-error 208 209 def _swig_call(self, method, request, response) -> None: UnavailableError: failed to connect to all addresses --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/4213622925.py in <module> ----> 1 schema_file = os.path.join(schema_artifacts[-1].uri, 'schema.pbtxt') 2 print("Generated schame file:{}".format(schema_file)) 3 4 stats_path = stats_artifacts[-1].uri 5 train_stats_file = os.path.join(stats_path, 'train', 'stats_tfrecord') NameError: name 'schema_artifacts' is not defined -------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/1981782273.py in <module> ----> 1 schema = tfdv.load_schema_text(schema_file) 2 tfdv.display_schema(schema=schema) NameError: name 'schema_file' is not defined -------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/4235532081.py in <module> ----> 1 train_stats = tfdv.load_statistics(train_stats_file) 2 eval_stats = tfdv.load_statistics(eval_stats_file) 3 tfdv.visualize_statistics(lhs_statistics=eval_stats, rhs_statistics=train_stats, 4 lhs_name='EVAL_DATASET', rhs_name='TRAIN_DATASET') NameError: name 'train_stats_file' is not defined --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/804134886.py in <module> ----> 1 train_anomalies = tfdv.load_anomalies_text(train_anomalies_file) 2 tfdv.display_anomalies(train_anomalies) NameError: name 'train_anomalies_file' is not defined --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/2432183026.py in <module> ----> 1 eval_anomalies = tfdv.load_anomalies_text(eval_anomalies_file) 2 tfdv.display_anomalies(eval_anomalies) NameError: name 'eval_anomalies_file' is not defined --------------------------------------------------------------------------- _InactiveRpcError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 201 try: --> 202 response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec)) 203 except grpc.RpcError as e: /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression) 945 wait_for_ready, compression) --> 946 return _end_unary_response_blocking(state, call, False, None) 947 /opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 848 else: --> 849 raise _InactiveRpcError(state) 850 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1725353200.878418859","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1725353200.878417252","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" > During handling of the above exception, another exception occurred: UnavailableError Traceback (most recent call last) /tmp/ipykernel_18688/3595433068.py in <module> 1 with metadata.Metadata(connection_config) as store: ----> 2 model_eval_artifacts = store.get_artifacts_by_type(standard_artifacts.ModelEvaluation.TYPE_NAME) 3 hyperparam_artifacts = store.get_artifacts_by_type(standard_artifacts.HyperParameters.TYPE_NAME) 4 5 model_eval_path = model_eval_artifacts[-1].uri /opt/conda/lib/python3.7/site-packages/tfx/orchestration/metadata.py in get_artifacts_by_type(self, type_name) 250 self, type_name: Text) -> List[metadata_store_pb2.Artifact]: 251 """Fetches artifacts given artifact type name.""" --> 252 return self.store.get_artifacts_by_type(type_name) 253 254 # TODO(b/145751019): Remove this once migrated to use MLMD built-in states. /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in get_artifacts_by_type(self, type_name) 585 response = metadata_store_service_pb2.GetArtifactsByTypeResponse() 586 --> 587 self._call('GetArtifactsByType', request, response) 588 result = [] 589 for x in response.artifacts: /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response) 175 while True: 176 try: --> 177 return self._call_method(method_name, request, response) 178 except errors.AbortedError: 179 num_retries -= 1 /opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response) 205 # description. 206 # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode --> 207 raise _make_exception(e.details(), e.code().value[0]) # pytype: disable=attribute-error 208 209 def _swig_call(self, method, request, response) -> None: UnavailableError: failed to connect to all addresses --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/2981799238.py in <module> 1 # Latest pipeline run Tuner search space. ----> 2 json.loads(file_io.read_file_to_string(best_hparams_path))['space'] NameError: name 'best_hparams_path' is not defined --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/163579549.py in <module> 1 # Latest pipeline run Tuner searched best_hyperparameters artifacts. ----> 2 json.loads(file_io.read_file_to_string(best_hparams_path))['values'] NameError: name 'best_hparams_path' is not defined --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_18688/3972218752.py in <module> ----> 1 eval_result = tfma.load_eval_result(model_eval_path) 2 tfma.view.render_slicing_metrics( 3 eval_result, slicing_column='Wilderness_Area') NameError: name 'model_eval_path' is not defined

Wiehan W. · Reviewed about 1 year ago

Amine N. · Reviewed about 1 year ago

The lab is broken

Boris Enrique M. · Reviewed about 1 year ago

Another outdated lab for this course

Liam B. · Reviewed about 1 year ago

For some reason the pipeline run in this lab has the tuner turned on when it really should not be using the tuner based on previous labs. The run failed inside the tuner stage, here's the logs: time="2024-08-25T20:59:56.022Z" level=info msg="capturing logs" argo=true 2024-08-25 20:59:56.628983: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib 2024-08-25 20:59:56.629036: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. INFO:absl:tensorflow_ranking is not available: No module named 'tensorflow_ranking' INFO:absl:tensorflow_text is not available: No module named 'tensorflow_text' INFO:absl:Running driver for Tuner INFO:absl:MetadataStore with gRPC connection initialized INFO:absl:Adding KFP pod name tfx-covertype-lab-04-h5w92-4165073768 to execution INFO:absl:Running executor for Tuner INFO:absl:Attempting to infer TFX Python dependency for beam INFO:absl:Copying all content from install dir /tfx-src/tfx to temp dir /tmp/tmpcocfdac0/build/tfx INFO:absl:Generating a temp setup file at /tmp/tmpcocfdac0/build/tfx/setup.py INFO:absl:Creating temporary sdist package, logs available at /tmp/tmpcocfdac0/build/tfx/setup.log INFO:absl:Added --extra_package=/tmp/tmpcocfdac0/build/tfx/dist/tfx_ephemeral-0.25.0.tar.gz to beam args WARNING:absl:workerCount is overridden with 2 INFO:absl:json_inputs='{"examples": [{"artifact": {"id": "5", "type_id": "18", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transformed_examples/5", "properties": {"split_names": {"string_value": "[\"train\", \"eval\"]"}}, "custom_properties": {"name": {"string_value": "transformed_examples"}, "state": {"string_value": "published"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483968", "last_update_time_since_epoch": "1724619580784"}, "artifact_type": {"id": "18", "name": "Examples", "properties": {"split_names": "STRING", "version": "INT", "span": "INT"}}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples"}], "transform_graph": [{"artifact": {"id": "4", "type_id": "22", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transform_graph/5", "custom_properties": {"state": {"string_value": "published"}, "name": {"string_value": "transform_graph"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483965", "last_update_time_since_epoch": "1724619580782"}, "artifact_type": {"id": "22", "name": "TransformGraph"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "TransformGraph"}]}'. INFO:absl:json_outputs='{"best_hyperparameters": [{"artifact": {"id": "9", "type_id": "28", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Tuner/best_hyperparameters/8", "custom_properties": {"name": {"string_value": "best_hyperparameters"}, "producer_component": {"string_value": "Tuner"}}}, "artifact_type": {"id": "28", "name": "HyperParameters"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "HyperParameters"}]}'. INFO:absl:json_exec_properties='{"custom_config": "{\"ai_platform_training_args\": {\"masterConfig\": {\"imageUri\": \"gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f\"}, \"project\": \"qwiklabs-gcp-01-8b52d2fcf958\", \"region\": \"us-central1\", \"serviceAccount\": \"tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com\"}}", "eval_args": "{\"num_steps\": 500}", "kfp_pod_name": "tfx-covertype-lab-04-h5w92-4165073768", "module_file": "model.py", "train_args": "{\"num_steps\": 5000}", "tune_args": "{\n \"num_parallel_trials\": 3\n}", "tuner_fn": null}'. WARNING:googleapiclient.discovery_cache:file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 36, in autodetect from google.appengine.api import memcache ModuleNotFoundError: No module named 'google.appengine' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 33, in <module> from oauth2client.contrib.locked_file import LockedFile ModuleNotFoundError: No module named 'oauth2client.contrib.locked_file' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 37, in <module> from oauth2client.locked_file import LockedFile ModuleNotFoundError: No module named 'oauth2client.locked_file' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 42, in autodetect from . import file_cache File "/usr/local/lib/python3.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module> "file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth" ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/discovery/v1/apis/ml/v1/rest INFO:absl:TrainingInput={'masterConfig': {'imageUri': 'gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f', 'containerCommand': ['python', '-m', 'tfx.scripts.run_executor', '--executor_class_path', 'tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor', '--inputs', '{"examples": [{"artifact": {"id": "5", "type_id": "18", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transformed_examples/5", "properties": {"split_names": {"string_value": "[\\"train\\", \\"eval\\"]"}}, "custom_properties": {"name": {"string_value": "transformed_examples"}, "state": {"string_value": "published"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483968", "last_update_time_since_epoch": "1724619580784"}, "artifact_type": {"id": "18", "name": "Examples", "properties": {"split_names": "STRING", "version": "INT", "span": "INT"}}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples"}], "transform_graph": [{"artifact": {"id": "4", "type_id": "22", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transform_graph/5", "custom_properties": {"state": {"string_value": "published"}, "name": {"string_value": "transform_graph"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483965", "last_update_time_since_epoch": "1724619580782"}, "artifact_type": {"id": "22", "name": "TransformGraph"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "TransformGraph"}]}', '--outputs', '{"best_hyperparameters": [{"artifact": {"id": "9", "type_id": "28", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Tuner/best_hyperparameters/8", "custom_properties": {"name": {"string_value": "best_hyperparameters"}, "producer_component": {"string_value": "Tuner"}}}, "artifact_type": {"id": "28", "name": "HyperParameters"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "HyperParameters"}]}', '--exec-properties', '{"custom_config": "{\\"ai_platform_training_args\\": {\\"masterConfig\\": {\\"imageUri\\": \\"gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f\\"}, \\"project\\": \\"qwiklabs-gcp-01-8b52d2fcf958\\", \\"region\\": \\"us-central1\\", \\"serviceAccount\\": \\"tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com\\"}}", "eval_args": "{\\"num_steps\\": 500}", "kfp_pod_name": "tfx-covertype-lab-04-h5w92-4165073768", "module_file": "model.py", "train_args": "{\\"num_steps\\": 5000}", "tune_args": "{\\n \\"num_parallel_trials\\": 3\\n}", "tuner_fn": null}']}, 'region': 'us-central1', 'serviceAccount': 'tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com', 'workerCount': 2, 'scaleTier': 'CUSTOM', 'masterType': 'standard', 'workerType': 'standard'} INFO:absl:Submitting job='tfx_tuner_20240825210006', project='qwiklabs-gcp-01-8b52d2fcf958' to AI Platform. INFO:googleapiclient.discovery:URL being requested: POST https://ml.googleapis.com/v1/projects/qwiklabs-gcp-01-8b52d2fcf958/jobs?alt=json INFO:googleapiclient.discovery:URL being requested: GET https://ml.googleapis.com/v1/projects/qwiklabs-gcp-01-8b52d2fcf958/jobs/tfx_tuner_20240825210006?alt=json ERROR:absl:Job 'projects/qwiklabs-gcp-01-8b52d2fcf958/jobs/tfx_tuner_20240825210006' did not succeed. Detailed response {'jobId': 'tfx_tuner_20240825210006', 'trainingInput': {'scaleTier': 'CUSTOM', 'masterType': 'standard', 'workerType': 'standard', 'workerCount': '2', 'region': 'us-central1', 'masterConfig': {'imageUri': 'gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f', 'containerCommand': ['python', '-m', 'tfx.scripts.run_executor', '--executor_class_path', 'tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor', '--inputs', '{"examples": [{"artifact": {"id": "5", "type_id": "18", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transformed_examples/5", "properties": {"split_names": {"string_value": "[\\"train\\", \\"eval\\"]"}}, "custom_properties": {"name": {"string_value": "transformed_examples"}, "state": {"string_value": "published"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483968", "last_update_time_since_epoch": "1724619580784"}, "artifact_type": {"id": "18", "name": "Examples", "properties": {"split_names": "STRING", "version": "INT", "span": "INT"}}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples"}], "transform_graph": [{"artifact": {"id": "4", "type_id": "22", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transform_graph/5", "custom_properties": {"state": {"string_value": "published"}, "name": {"string_value": "transform_graph"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483965", "last_update_time_since_epoch": "1724619580782"}, "artifact_type": {"id": "22", "name": "TransformGraph"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "TransformGraph"}]}', '--outputs', '{"best_hyperparameters": [{"artifact": {"id": "9", "type_id": "28", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Tuner/best_hyperparameters/8", "custom_properties": {"name": {"string_value": "best_hyperparameters"}, "producer_component": {"string_value": "Tuner"}}}, "artifact_type": {"id": "28", "name": "HyperParameters"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "HyperParameters"}]}', '--exec-properties', '{"custom_config": "{\\"ai_platform_training_args\\": {\\"masterConfig\\": {\\"imageUri\\": \\"gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f\\"}, \\"project\\": \\"qwiklabs-gcp-01-8b52d2fcf958\\", \\"region\\": \\"us-central1\\", \\"serviceAccount\\": \\"tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com\\"}}", "eval_args": "{\\"num_steps\\": 500}", "kfp_pod_name": "tfx-covertype-lab-04-h5w92-4165073768", "module_file": "model.py", "train_args": "{\\"num_steps\\": 5000}", "tune_args": "{\\n \\"num_parallel_trials\\": 3\\n}", "tuner_fn": null}']}, 'serviceAccount': 'tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com'}, 'createTime': '2024-08-25T21:00:25Z', 'startTime': '2024-08-25T21:09:45Z', 'endTime': '2024-08-25T21:09:48Z', 'state': 'FAILED', 'errorMessage': 'The replica worker 0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=478749246700&resource=ml_job%2Fjob_id%2Ftfx_tuner_20240825210006&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22tfx_tuner_20240825210006%22 ', 'trainingOutput': {}, 'labels': {'tfx_executor': 'ensions-google_cloud_ai_platform-tuner-executor-_workerexecutor', 'tfx_py_version': '3-7', 'tfx_runner': 'kfp', 'tfx_version': '0-25-0'}, 'etag': 'iPZ4v+6czZE=', 'jobPosition': '0'}. Traceback (most recent call last): File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module> main() File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main execution_info = launcher.launch() File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch copy.deepcopy(execution_decision.exec_properties)) File "/tfx-src/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties)) File "/tfx-src/tfx/extensions/google_cloud_ai_platform/tuner/executor.py", line 121, in Do job_id) File "/tfx-src/tfx/extensions/google_cloud_ai_platform/runner.py", line 305, in start_aip_training job_labels=job_labels) File "/tfx-src/tfx/extensions/google_cloud_ai_platform/runner.py", line 194, in _launch_aip_training raise RuntimeError(err_msg) RuntimeError: Job 'projects/qwiklabs-gcp-01-8b52d2fcf958/jobs/tfx_tuner_20240825210006' did not succeed. Detailed response {'jobId': 'tfx_tuner_20240825210006', 'trainingInput': {'scaleTier': 'CUSTOM', 'masterType': 'standard', 'workerType': 'standard', 'workerCount': '2', 'region': 'us-central1', 'masterConfig': {'imageUri': 'gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f', 'containerCommand': ['python', '-m', 'tfx.scripts.run_executor', '--executor_class_path', 'tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor', '--inputs', '{"examples": [{"artifact": {"id": "5", "type_id": "18", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transformed_examples/5", "properties": {"split_names": {"string_value": "[\\"train\\", \\"eval\\"]"}}, "custom_properties": {"name": {"string_value": "transformed_examples"}, "state": {"string_value": "published"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483968", "last_update_time_since_epoch": "1724619580784"}, "artifact_type": {"id": "18", "name": "Examples", "properties": {"split_names": "STRING", "version": "INT", "span": "INT"}}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples"}], "transform_graph": [{"artifact": {"id": "4", "type_id": "22", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Transform/transform_graph/5", "custom_properties": {"state": {"string_value": "published"}, "name": {"string_value": "transform_graph"}, "producer_component": {"string_value": "Transform"}}, "state": "LIVE", "create_time_since_epoch": "1724619483965", "last_update_time_since_epoch": "1724619580782"}, "artifact_type": {"id": "22", "name": "TransformGraph"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "TransformGraph"}]}', '--outputs', '{"best_hyperparameters": [{"artifact": {"id": "9", "type_id": "28", "uri": "gs://qwiklabs-gcp-01-8b52d2fcf958-kubeflowpipelines-default//tfx_covertype_lab_04/aec7304e-9b62-4712-ad01-85d8dd6f711d/Tuner/best_hyperparameters/8", "custom_properties": {"name": {"string_value": "best_hyperparameters"}, "producer_component": {"string_value": "Tuner"}}}, "artifact_type": {"id": "28", "name": "HyperParameters"}, "__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "HyperParameters"}]}', '--exec-properties', '{"custom_config": "{\\"ai_platform_training_args\\": {\\"masterConfig\\": {\\"imageUri\\": \\"gcr.io/qwiklabs-gcp-01-8b52d2fcf958/tfx_covertype_lab_04@sha256:f29bff7ce54b6232257cd50901602a0349b3f07ca7df72bbbe7c94837e45925f\\"}, \\"project\\": \\"qwiklabs-gcp-01-8b52d2fcf958\\", \\"region\\": \\"us-central1\\", \\"serviceAccount\\": \\"tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com\\"}}", "eval_args": "{\\"num_steps\\": 500}", "kfp_pod_name": "tfx-covertype-lab-04-h5w92-4165073768", "module_file": "model.py", "train_args": "{\\"num_steps\\": 5000}", "tune_args": "{\\n \\"num_parallel_trials\\": 3\\n}", "tuner_fn": null}']}, 'serviceAccount': 'tfx-tuner-caip-service-account@qwiklabs-gcp-01-8b52d2fcf958.iam.gserviceaccount.com'}, 'createTime': '2024-08-25T21:00:25Z', 'startTime': '2024-08-25T21:09:45Z', 'endTime': '2024-08-25T21:09:48Z', 'state': 'FAILED', 'errorMessage': 'The replica worker 0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=478749246700&resource=ml_job%2Fjob_id%2Ftfx_tuner_20240825210006&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22tfx_tuner_20240825210006%22 ', 'trainingOutput': {}, 'labels': {'tfx_executor': 'ensions-google_cloud_ai_platform-tuner-executor-_workerexecutor', 'tfx_py_version': '3-7', 'tfx_runner': 'kfp', 'tfx_version': '0-25-0'}, 'etag': 'iPZ4v+6czZE=', 'jobPosition': '0'}. time="2024-08-25T21:09:59.279Z" level=info msg="sub-process exited" argo=true error="<nil>" time="2024-08-25T21:09:59.279Z" level=error msg="cannot save artifact /mlpipeline-ui-metadata.json" argo=true error="stat /mlpipeline-ui-metadata.json: no such file or directory" Error: exit status 1

Vu N. · Reviewed about 1 year ago

Muhammad H. · Reviewed about 1 year ago

Xiong w. · Reviewed about 1 year ago

Probably the worst of your labs. And that's saying a lot because few work at all. Aren't you embarrassed to wate our time like this? How do you sleep at night?

Ilsa C. · Reviewed about 1 year ago

incomplete lab info

Rupesh K. · Reviewed about 1 year ago

Srinibash S. · Reviewed about 1 year ago

Americo V. · Reviewed about 1 year ago

Manuel G. · Reviewed about 1 year ago

Muhammad H. · Reviewed about 1 year ago

SHIVANK U. · Reviewed about 1 year ago

Eddison L. · Reviewed about 1 year ago

Pablo R. · Reviewed about 1 year ago

Muhammad H. · Reviewed about 1 year ago

multiple errors in the lab

Cristina D. · Reviewed about 1 year ago

Pipelines deploying fails every time even if all steps are followed properly.

Aiswaria A. · Reviewed about 1 year ago

We do not ensure the published reviews originate from consumers who have purchased or used the products. Reviews are not verified by Google.