How to debug cloud-init - cloud-init 25.1.2 documentation (original) (raw)

View this page

Toggle table of contents sidebar

There are several cloud-init failure modes that one may need to debug. Debugging is specific to the scenario, but the starting points are often similar:

I can’t log in to my instance

One of the more challenging scenarios to debug is when you don’t have shell access to your instance. You have a few options:

  1. Acquire log messages from the serial console and check for any errors.
  2. To access instances without SSH available, create a user with password access (using the user-data) and log in via the cloud serial port console. This only works if cc_users_groups successfully ran.
  3. Try running the same user-data locally, such as in one of thetutorials. Use LXD or QEMU locally to get a shell or logs then debug with these steps.
  4. Try copying the image to your local system, mount the filesystem locally and inspect the image logs for clues.

Cloud-init did not run

  1. Check the output of cloud-init status --long
    • what is the value of the 'extended_status' key?
    • what is the value of the 'boot_status_code' key?
      See our reported status explanation for more information on the status.
  2. Check the contents of /run/cloud-init/ds-identify.log
    This log file is used when the platform that cloud-init is running onis detected. This stage enables or disables cloud-init.
  3. Check the status of the services
    systemctl status cloud-init-local.service cloud-init-network.service\
    cloud-config.service cloud-final.service
    Cloud-init may have started to run, but not completed. This shows how many, and which, cloud-init stages completed.

Cloud-init ran, but didn’t do what I want it to

  1. If you are using cloud-init’s user-datacloud config, make sure to validate your user-data cloud config
  2. Check for errors in cloud-init status --long
    • what is the value of the 'errors' key?
    • what is the value of the 'recoverable_errors' key?
      See our guide on exported errors for more information on these exported errors.
  3. For more context on errors, check the logs files:
    • /var/log/cloud-init.log
    • /var/log/cloud-init-output.log
      Identify errors in the logs and the lines preceding these errors.
      Ask yourself:
    • According to the log files, what went wrong?
    • How does the cloud-init error relate to the configuration provided to this instance?
    • What does the documentation say about the parts of the configuration that relate to this error? Did a configuration module fail?
    • What failure state is cloud-init in?

Cloud-init never finished running

There are many reasons why cloud-init may fail to complete. Some reasons are internal to cloud-init, but in other cases, cloud-init failure to complete may be a symptom of failure in other components of the system, or the result of a user configuration.

External reasons

Internal reasons

To start debugging

  1. Check dmesg for errors:
    dmesg -T | grep -i -e warning -e error -e fatal -e exception
  2. Investigate other systemd services that failed
  3. Check the output of cloud-init status --long
    • what is the value of the 'extended_status' key?
    • what is the value of the 'boot_status_code' key?
      See our guide on exported errors for more information on these exported errors.
  4. Inspect running services boot stage:
    $ systemctl list-jobs --after
    JOB UNIT TYPE STATE
    150 cloud-final.service start waiting
    └─ waiting for job 147 (cloud-init.target/start) - -

155 blocking-daemon.service start running
└─ waiting for job 150 (cloud-final.service/start) - -
147 cloud-init.target start waiting
3 jobs listed.
In the above example we can see that cloud-final.service is waiting and is ordered before cloud-init.target, and thatblocking-daemon.service is currently running and is ordered before cloud-final.service. From this output, we deduce that cloud-init is not complete because the service named blocking-daemon.service hasn’t yet completed, and that we should investigate blocking-daemon.serviceto understand why it is still running. 5. Use the PID of the running service to find all running subprocesses. Any running process that was spawned by cloud-init may be blocking cloud-init from continuing.
Ask yourself:

  1. For more context on errors, check the logs files:
    • /var/log/cloud-init.log
    • /var/log/cloud-init-output.log
      Identify errors in the logs and the lines preceding these errors.
      Ask yourself:
    • According to the log files, what went wrong?
    • How does the cloud-init error relate to the configuration provided to this instance?
    • What does the documentation say about the parts of the configuration that relate to this error?