Forums Search Login Register
Login
Username
Password
New Posts Todays Posts Find Users Posts Unanswered Threads Help Mark Forums as Read

Thread Options  Subscribe to this thread Subscribed Users  Add Reply 
Posts: 1,401
Trophies:
1
Architecture:
1. Home NAS setup so corners are cut in ways that would not be if truly prod enterprise environment.
2. zpool consisting of 8 x 8TB drives in RAIDZ2 (upgraded from original 8x4TB)
3. 2 SSD with multiple partitions for: 1) Boot (Ubuntu 20.04 btrfs mirror) [when NAS was created booting to zfs was not a widely supported thing in Ubuntu, so even though I hate btrfs that is what we have] 2) OS swap 3) zfs cache 4) zfs log
4. Motherboard 8 core Asrock C2750D4I, 32GB ECC memory

Symptoms Observed and Tests Conducted:
1. Over the last couple of months during scrub (and only during scrub) miscellaneous read and write errors would pop up some times to the point where the drive would be faulted. It was always the same two drives at the bottom of the list of drives in zpool status.
2. A check of the drives smart data would never show an issue. The drives where old but there were no Offline_uncorrectable or Pending_Sectors or other items indicating an issue with the drive.
3. Smartctl long tests on the drive with errors completed with no issues.
4. Swapping the drives to different NAS bays made no impact.
5. Swapping drives with known good drives made no change.
6. Swapping NAS power supply for a newer beefier 80plus one with higher capacity made no change.
7. Changed sata cables?. No impact.
8. Moved affected drive to an external sata controller card. No change.

Resolution?
While doing the eighth and last test mentioned above (I could not think of any more tests to perform honestly) the NAS failed to reboot. One of the two SSD drives was as dead as a doornail?. In retrospect it appears it was failing for a while. I replaced the SSD drive by adding a new one to the boot btrfs mirror. Recreated the pools log and cache using the existing SSD and the new SSD and the issue seems to have gone away?. Ran a scrub and there are no errors. Does this make sense? Can a failing log or cache drive manifest (falsely / misleadingly) as read or write issues with data drives in a pool?
07-28-2020, 03:26 AM
Reply
Subscribe to this thread Subscribed Users  Add Reply 


Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  [server] Ubuntu server 20.04 runnign slow rdaniellabarryr 0 0 Yesterday 08:55 AM
Last Post: rdaniellabarryr
  [ubuntu] Running Ubuntu on a multi socket x86_64 server ogerardosandovaln 0 5 10-17-2020 02:15 AM
Last Post: ogerardosandovaln
  Ubuntu 20.04.1 Server Unattended Autoinstall from USB stick help nteddyflowccsj 0 5 10-15-2020 11:12 PM
Last Post: nteddyflowccsj
  My Ubuntu server doesn't connect to Ethernet when rebooted joaltun85 0 13 10-14-2020 02:21 AM
Last Post: joaltun85
  Ubuntu Server 20.04 - Partition Size MarkC1942 0 4 10-02-2020 05:08 PM
Last Post: MarkC1942
  Ubuntu Server 20.04 Network Connection manastassian 0 8 10-02-2020 05:08 PM
Last Post: manastassian
  [ubuntu] add a domain on server kyleStyle 0 7 10-02-2020 01:06 PM
Last Post: kyleStyle
  [server] Windows cant access \\serverip kyleStyle 0 6 09-26-2020 02:04 PM
Last Post: kyleStyle
  Ubuntu server can not be acessed, and is olny pingable for a short time. SSH not work kalundo38 0 11 09-23-2020 01:27 AM
Last Post: kalundo38
  [server] High CPU Usage after upgrade orileekidd 0 16 09-17-2020 07:23 AM
Last Post: orileekidd

Forum Jump:



User(s) browsing this thread: 1 Guest(s)



Contact Us Privacy Policy Top RSS
Forum Software By: MyBB, © 2002-2020