Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-3698

Host went to error state in ON, because it lost connection to the NFS server

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Not Evaluated
    • None
    • unversioned
    • None
    • Ubuntu 18.04

    Description

      Symptom: OpenNebula shows host in error state.

      Debug analysis: The shell and ruby scripts running monitoring are stuck. They can't run "if -x /usr/lib/one/datastores/100" meaning there's a problem with mount. Running "df" indicates the same thing, it gets stuck. System log shows lots of

      Apr 28 19:16:25 on-ox kernel: nfs: server qt-nfs01 not responding, still trying
      Apr 28 18:52:56 on-ox kernel: nfs: server qt-nfs01 not responding, timed out
      Apr 28 18:38:16 on-ox kernel: nfs: server qt-nfs01 not responding, still trying
      

      Running "umount /usr/lib/one/datastores/100" does not work. It gets stuck.

      Solution: I don't know why, but I installed tshark and began looking at the network traffic between the NFS server and this host in question, and the problem went away. Quantum mechanics at its best! Watch it, and the behavior will change.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tosaraja Tony Sarajärvi
            tosaraja Tony Sarajärvi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes