Recovery steps for Embedded Cluster and BYO upgrade failures

Upgrade issues caused by missing Dynamic Thresholds app Copied

Upgrading your Embedded Cluster (EC) or Bring Your Own (BYO) setup requires careful preparation to ensure a smooth transition. If you upgrade without re-enabling the Dynamic Threshold app (Signal Generator), the deployment will fail, and the clear-dpd-tasks-job will become stuck.

If you encounter a Deploy Failed error after upgrading without Dynamic Thresholds enabled, follow this guide to recover and complete your deployment successfully.

Deploy failed

  1. Check if the clear-dpd-tasks-job is stuck or incomplete.

  2. Run the following command to list all jobs in the namespace.

    kubectl get jobs -n <namespace>
    

    If you see clear-dpd-tasks-job in the output with a Running status, it is stuck or has not completed.

    NAME                                      STATUS     COMPLETIONS   DURATION   AGE
    clear-dpd-tasks-job                       Running    0/1           5m25s      5m25s
    embedded-cluster-upgrade-<id>  Complete   1/1           30s        7m47s
    
  3. Use the following command to check the pod created by the clear-dpd-tasks-job job in the namespace.

    kubectl get pods -n <namespace> | grep clear-dpd-tasks-job
    

    Example output:

    clear-dpd-tasks-job-cq4z6            0/1     ContainerCreating   0         12s
    
  4. Delete the stuck clear-dpd-tasks-job.

    kubectl delete job clear-dpd-tasks-job -n <namespace>
    

    Example output:

    job.batch "clear-dpd-tasks-job" deleted
    
  5. Verify that the job has been removed. You should no longer see clear-dpd-tasks-job listed.

    kubectl get jobs -n <namespace>
    NAME                                      STATUS     COMPLETIONS   DURATION   AGE
    embedded-cluster-upgrade-<id>   Complete   1/1           30s        8m11s
    
  6. Go to the KOTS Admin Console and click the Redeploy button.

  7. Wait for the deployment to complete.

  8. Confirm that the deployment was successful and the system is operating as expected.

Deploy success

["ITRS Analytics"] ["User Guide", "Technical Reference"]

Was this topic helpful?