Announcement

Collapse
No announcement yet.

How to track down what caused a spontaenous reboot?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to track down what caused a spontaenous reboot?

    Been testing Untangle for a couple of weeks and this morning, the Dell server I'm running this on did a spontaneous reboot. Everything was running smoothly and I noticed a website wasn't opening in a new tab on my browsers but my VoIP phone was still working so I didn't think it was the firewall then seconds later, my VoIP phone died and I heard the fans on my Dell go into overdrive (like they do when I first turn the server on or during high CPU load) so I walked around the corner and saw that the system was indeed rebooting.

    I got copies of the following files: syslog, daemon.log, debug ........ is there a particular word/phrase I should look for to try and figure out what caused the crash/reboot? I'd post them somewhere but the syslog and daemon files are 13G each and debug is 5G.

  • #2
    Random reboots are usually a bad driver, or defective RAM. You need to do a hardware diagnostic on the entire platform.
    Rob Sandling, BS:SWE, MCP, Microsoft Certified: Azure Administrator Associate
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: [email protected]

    Comment


    • #3
      Did you import a backup from 16.5.2? I had reboots and network failures when I did. Installed clean and am redoing my config a piece at a time. Works fine so far.

      Comment


      • #4
        Originally posted by sky-knight View Post
        Random reboots are usually a bad driver, or defective RAM. You need to do a hardware diagnostic on the entire platform.
        I tested all 512GB of ECC RAM for several hours and no errors reported so far. The NIC is an Intel I350 DP and appears to be working AOK. RAID (mirror) SAS drives all tested OK. No big deal. I think I'll switch back to my ASUS router until v17 drops then give it another go.

        I really appreciated all your replies and assistance over the past few weeks!

        Comment


        • #5
          Originally posted by donhwyo View Post
          Did you import a backup from 16.5.2? I had reboots and network failures when I did. Installed clean and am redoing my config a piece at a time. Works fine so far.
          Nope, it was a fresh install of 16.6.2 and no restores.

          Comment


          • #6
            Originally posted by road hazard View Post

            RAID (mirror) SAS drives all tested OK. No big deal.
            I found the problem... Dell RAID controllers and Linux have a spotty history, don't use them. Redundancy of the UTM is achieved via a second platform an VRRP. If you must use a Dell platform with RAID, then your best bet is to install vSphere on the platform, install all the Dell provided stuff that supports the VMWare platform, and then deploy NGFW as a VM on that platform.

            Yes, it's more complex, but it's also supported. Hardware issues are a pain, supporting your own hardware is work. NGFW supports only its own appliances, and VMWare for a reason!
            Rob Sandling, BS:SWE, MCP, Microsoft Certified: Azure Administrator Associate
            NexgenAppliances.com
            Phone: 866-794-8879 x201
            Email: [email protected]

            Comment


            • #7
              Originally posted by sky-knight View Post

              I found the problem... Dell RAID controllers and Linux have a spotty history, don't use them. Redundancy of the UTM is achieved via a second platform an VRRP. If you must use a Dell platform with RAID, then your best bet is to install vSphere on the platform, install all the Dell provided stuff that supports the VMWare platform, and then deploy NGFW as a VM on that platform.

              Yes, it's more complex, but it's also supported. Hardware issues are a pain, supporting your own hardware is work. NGFW supports only its own appliances, and VMWare for a reason!
              Good to know!

              Would there be some sort of disk/RAID error message in those log files that I could look for to know 100% if that was the problem?

              Comment


              • #8
                Originally posted by road hazard View Post

                Good to know!

                Would there be some sort of disk/RAID error message in those log files that I could look for to know 100% if that was the problem?
                Yeah buried somewhere in the Debianese that changes every kernel. There's a kernel panic, and a stack dump. But if you can read that mess you're a smarter man than I.
                Rob Sandling, BS:SWE, MCP, Microsoft Certified: Azure Administrator Associate
                NexgenAppliances.com
                Phone: 866-794-8879 x201
                Email: [email protected]

                Comment


                • #9
                  You might try
                  Code:
                  dmesg |grep -iE "error|fail"
                  or
                  Code:
                   cat /var/log/syslog |grep -iE "error|fail"
                  . Or other log files. Might get some clues.

                  Most Dell perc cards are LSI (or whoever bought them out this week) and some can be flashed to non Dell version. That is not supported but works for some.

                  Comment

                  Working...
                  X
                  😀
                  🥰
                  🤢
                  😎
                  😡
                  👍
                  👎