openbsd-bugs
[Top] [All Lists]

i386/3019: 3.2-stable: system repeatedly freezes solid (was fine with 3.

To: gnats@openbsd.org
Subject: i386/3019: 3.2-stable: system repeatedly freezes solid (was fine with 3.1-stable)
From: ewen@naos.co.nz
Date: Mon, 9 Dec 2002 10:32:03 +1300 (NZDT)
Cc: ewen@naos.co.nz
Reply-to: ewen@naos.co.nz
Resent-date: Mon, 9 Dec 2002 16:58:40 -0700 (MST)
Resent-from: gnats@cvs.openbsd.org (GNATS Filer)
Resent-message-id: <200212092358.gB9NweqI018881@cvs.openbsd.org>
Resent-reply-to: gnats@cvs.openbsd.org, ewen@naos.co.nz
Resent-to: bugs@cvs.openbsd.org
Sender: owner-bugs@openbsd.org
>Number:         3019
>Category:       i386
>Synopsis:       System repeated freezes (solid) with 3.2 (was stable with 3.1)
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Dec 09 16:58:39 MST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Ewen McNeill
>Release:        OPENBSD_3_2
>Organization:
>Environment:
        System      : OpenBSD 3.2
        Architecture: OpenBSD.i386
        Machine     : i386

>Description:
        During upgrade to 3.2 (booted off floppy32.fs) system repeatedly 
        froze completely during FTP download of files from local FTP 
        server usually between 10% and 50% of the way into downloading
        base32.tgz.  When the system froze there was no activity for 
        several minutes, system was no longer responding on its network
        interface (eg, no ping responses), and required a reset to resume
        operation.  After half a dozen attempts it appeared impossible
        to perform a network-based upgrade, and the system was eventually
        upgraded by downloading the relevant files while booted under 3.1
        and then performing a mounted-file-system upgrade.

        The system had previously run OpenBSD 3.1 (-stable) perfectly stably
        with the most recent uptime being 114 days (time since last rebooted
        after patching).  So the issue appears newly introduced with 3.2.

        After upgrade the system initially appeared stable, but has frozen
        twice in the last 16 hours (requiring a system reset) while running
        a GENERIC kernel built against the 3.2-stable branch (ie, with erata
        compiled in).  The second freeze was while running with pcibios0 
        disabled (as per instructions in INSTALL.i386).

        This is far from stable enough for production use (system is a 
        VPN gateway), and I will probably have to revert to 3.1 unless
        there is a possible fix to try available soon.

>How-To-Repeat:
        Based on the instances where it occured, and frequency of occurance, 
        the issue appears to be either with disk I/O, or overlapped disk 
        and network I/O, or possibly with just network I/O, and appears
        to be a race condition, perhaps an interrupt-related one.

        The system handles a fair amount of network traffic, and network
        traffic alone doesn't appear to triggger it.

>Fix:
        None known so far.  Disabling pcibios0 (flags 3 on bios0) does
        not appear to make a difference (was suggested in INSTALL.i386)

dmesg output (clean boot, no config options):

OpenBSD 3.2-stable (build) #0: Sun Dec  8 02:05:09 NZDT 2002
    ewen@vm-openbsd.em.naos.co.nz:/home/ewen/kernel/build
cpu0: F00F bug workaround installed
cpu0: Intel Pentium (P54C) ("GenuineIntel" 586-class) 99 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,MCE,CX8
real mem  = 49922048 (48752K)
avail mem = 40730624 (39776K)
using 635 buffers containing 2600960 bytes (2540K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(bd) BIOS, date 05/05/98, BIOS32 rev. 0 @ 0xfcd20
pcibios0 at bios0: rev. 2.1 @ 0xf0000/0x69c
pcibios0: PCI BIOS has 5 Interrupt Routing table entries
pcibios0: PCI Interrupt Router at 000:01:0 ("SIS 85C503 ISA" rev 0x00)
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xc0000/0x8000
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "SIS 5511" rev 0x00
pcib0 at pci0 dev 1 function 0 "SIS 85C503 ISA" rev 0x01
pciide0 at pci0 dev 1 function 1 "SIS 5513 EIDE" rev 0x07: DMA, channel 0 
configured to compatibility, channel 1 configured to compatibility
wd0 at pciide0 channel 0 drive 0: <WDC AC2540F>
wd0: 16-sector PIO, LBA, 515MB, 1048 cyl, 16 head, 63 sec, 1056384 sectors
wd0(pciide0:0:0): using PIO mode 3
xl0 at pci0 dev 10 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 12 address 
00:04:76:73:04:75
exphy0 at xl0 phy 24: 3Com internal media interface
xl1 at pci0 dev 11 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 10 address 
00:04:76:73:4e:2a
exphy1 at xl1 phy 24: 3Com internal media interface
vga1 at pci0 dev 12 function 0 "S3 Trio64V2/DX" rev 0x14
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
isa0 at pcib0
isadma0 at isa0
ast0 at isa0 port 0x1a0/32 irq 5
pccom3 at ast0 slave 0: ns16550a, 16 byte fifo
pccom4 at ast0 slave 1: ns16550a, 16 byte fifo
pccom5 at ast0 slave 2: ns16550a, 16 byte fifo
pccom6 at ast0 slave 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask 4040 netmask 5440 ttymask 54e2
pctr: 586-class performance counters and user-level cycle counter enabled
dkcsum: wd0 matched BIOS disk 80
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
WARNING: / was not properly unmounted
lpt0: offline
lpt0: output error

I no longer have a 3.1 dmesg online it seems (last 3.1 reboot was
August, and syslog messages are kept only about 5 weeks); but a
printed out copy (I write some syslog messages to the line printer)
indicates similiar dmesg contents, except that there were "command
never completed!" messages reported for xl0 and xl1 during initial
probing.  3.1 ran fine, without freezes, and was very stable, despite
those messages.


>Release-Note:
>Audit-Trail:
>Unformatted:
 NOTE: resent due to previous bug report not appearing on bugs, perhaps
 due to a missing Subject: line.  This version has a few additional details.

<Prev in Thread] Current Thread [Next in Thread>
  • i386/3019: 3.2-stable: system repeatedly freezes solid (was fine with 3.1-stable), ewen <=