Equipment/Landin: Difference between revisions

Equipment/Landin (view source)

Revision as of 09:22, 16 July 2018

1,080 bytes added , 16 July 2018

m

Moved around the ZFS stuff the end.

Kraptv

Administrators

1,034

edits

@@ Line 94: / Line 94: @@
 # FYI - a Similar pool creation expanded out would look like this
    sudo zpool create -f kinnaman raidz2 /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06BGA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06EWA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG0DJ9A /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ93TMF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9ES2F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9GPHF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J1EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J59F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N1AF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N2TF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N3EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PBJ76D4F
-==== ZFS Disk Death - what to do ====
-If a 1 or 2 disks die in the ZFS zpool, you'll want to replace them. You'll see something like a disk or two with the status UNAVAIL and the zpool state being DEGRADED.
-We don't want to shut off the computer, so what to do?
-* Make note of the disk ID(s) and search for those drives by doing
-  sudo sas2ircu 0 display | less
-* While scrolling up and down using less, you can find the affected dying drive serial number (starts with the letter P in our Hitachi examples)
-* Make a note of the enclosure number and the slot number on the controller in the command above.
-* Make the affected disk(s) blink in their slots if you have enclosures that blink properly, such as
-  sas2ircu 0 locate 2:1 on #REPLACE WITH THE RIGHT SLOT AND ENCLOSURE ID, DON'T JUST CUT AND PASTE THIS COMMAND AND REPLACE THE WRONG DRIVE. (This is blinking drive 1 in assembly 2)
-* then you'll see the blinking slot(s) and can remove those affected disks.
-* Replace the drives in the disk trays (you may need a Torx T10 driver or a careful flathead screwdriver to replace drives in the tray, and then reinsert.
-* Turn the blinking light off.
-  sas2ircu 0 locate 2:1 off
-* Find the new drive by either seeing the latest drive added in dmesg and then poking around /dev/disk/by-id for the right serial number. Example disk replacement (remember, use zpool status to find the old disk to replace)
-  sudo zpool replace -f peter ata-HITACHI_HUA721010KLA330_PAJ9N3EF ata-HITACHI_HUA721010KLA330_PBJ7DNWE
 ===Proxmox setup===
@@ Line 212: / Line 196: @@
 # Example of a Debian VM with a single core, 512MB of RAM, 10G HDD and connected to the "Bridge" interface <pre>qm create 104 --name "qm-test" --cdrom /var/lib/vz/template/iso/debian-9.4.0-amd64-netinst.iso --memory 512 --cores 1 --net0 "virtio,bridge=vmbr0" --scsi0 "file=peter:10,discard=on,size=10G"</pre>
-=== Check RAID array status ====
+=== RAID Status and How to Blink a Light and Replace a Drive ====
+Thankfully the system is not in the middle of a woodshop, but the batch of Hitachi 1TB drives are pretty old and we should expect disk failures to happen.  This is an overview of tools available to diagnose the health of the array.
+==== How is the ZFS Zpool Health, How is the Hardware Health ====
+* Very likely you want to see how ZFS sees the drives.  This command should suffice:
+  # zpool status -v
+* You can check the list of hardware connected to the array via the LSI (Avago/Broadcom) utility <pre>sas2ircu</pre>
+  # sas2ircu 0 display
+(you'll want to pipe this to less or a text file to scroll through the various notes.
+* Maybe you want to run through <pre>smartctl</pre> and see whether any of the disks are in a pre-fail state. Try a shell script like this:
+  for i in {a..o}; do
+      echo "Disk sd$i" $SN $MD
+      smartctl -i -A /dev/sd$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
+  done
+==== ZFS Disk Death - what to do ====
+If a 1 or 2 disks die in the ZFS zpool, you'll want to replace them. You'll see something like a disk or two with the status UNAVAIL and the zpool state being DEGRADED.
+We don't want to shut off the computer, so what to do?
+* Make note of the disk ID(s) and search for those drives by doing
+  # sas2ircu 0 display | less
+* While scrolling up and down using less, you can find the affected dying drive serial number (starts with the letter P in our Hitachi examples)
+* Make a note of the enclosure number and the slot number on the controller in the command above.
+* Make the affected disk(s) blink in their slots if you have enclosures that blink properly, such as
+  # sas2ircu 0 locate 2:1 on #REPLACE WITH THE RIGHT SLOT AND ENCLOSURE ID, DON'T JUST CUT AND PASTE THIS COMMAND AND REPLACE THE WRONG DRIVE. (This is blinking drive 1 in assembly 2)
+* then you'll see the blinking slot(s) and can remove those affected disks.
+* Replace the drives in the disk trays (you may need a Torx T10 driver or a careful flathead screwdriver to replace drives in the tray, and then reinsert.
+* Turn the blinking light off.
+  # sas2ircu 0 locate 2:1 off
+* Find the new drive by either seeing the latest drive added in dmesg and then poking around /dev/disk/by-id for the right serial number. Example disk replacement (remember, use zpool status to find the old disk to replace)
+  # zpool replace -f peter ata-HITACHI_HUA721010KLA330_PAJ9N3EF ata-HITACHI_HUA721010KLA330_PBJ7DNWE
-(As root):
+You can then run <pre>zpool status -v</pre> to see the replacement in progress and a time estimation to finish replacing the old drive in the ZFS array.  Nice!
-  doodles doodles.

Anonymous

Search

Equipment/Landin: Difference between revisions

Namespaces

More

Page actions

Equipment/Landin (view source)

Revision as of 09:22, 16 July 2018

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Equipment/Landin: Difference between revisions

Equipment/Landin (view source)

Revision as of 09:22, 16 July 2018

Navigation

Wiki tools

Page tools