Equipment/Landin: Difference between revisions

From London Hackspace Wiki

no edit summary
No edit summary
 
(27 intermediate revisions by 4 users not shown)
Line 5: Line 5:
|category=Equipment <!-- Main category. Please leave alone to keep item in this category -->
|category=Equipment <!-- Main category. Please leave alone to keep item in this category -->
|subcat=Systems <!-- Sub-category if one exists. Please check main listing to see other categories contained within the main one -->
|subcat=Systems <!-- Sub-category if one exists. Please check main listing to see other categories contained within the main one -->
|status=Good working order
|status=Operational
|consumables=<!-- Any items used up in normal operation, such as; ink, paper, saw-blades, cutting disks, oil, etc.. -->
|consumables=<!-- Any items used up in normal operation, such as; ink, paper, saw-blades, cutting disks, oil, etc.. -->
|accessories=<!-- Any items associated with the equipment but not consumable, such as; drill bits, safety gloves, goggles, etc.. -->
|accessories=<!-- Any items associated with the equipment but not consumable, such as; drill bits, safety gloves, goggles, etc.. -->
Line 12: Line 12:
|acnode=no
|acnode=no
|owner=LHS
|owner=LHS
|origin=Donation
|origin=Donation from kraptv
|location=Basement rack <!-- Floor, room/zone and location within that area -->
|location=first floor comms room<!-- Floor, room/zone and location within that area -->
|maintainers=Sysadmin team <!-- If someone is nominated as managing the upkeep of this item, please list them here. No links please; it currently breaks the template. -->
|maintainers=Sysadmin team <!-- If someone is nominated as managing the upkeep of this item, please list them here. No links please; it currently breaks the template. -->
|template_ver=1.1 <!-- Please do not change. Used for tracking out-of-date templates -->
|template_ver=1.1 <!-- Please do not change. Used for tracking out-of-date templates -->
Line 24: Line 24:


= Info =
= Info =
* IP: 10.20.20.10
* IP: 10.0.20.10
* DNS: landin.london.hackspace.org.uk
* DNS: landin.london.hackspace.org.uk
* Access: LDAP
* Access: LDAP


= Stats =
= Stats =
Landin is a Xyratex HS-1235T (OEM platform for IBM XIV, Dell Compellent, LaCie 12Big, Pure FA-300, and others - note also NetApp disk trays such as DS4243 and other Xyratex OEM users fit as well)
Landin is a Xyratex HS-1235T (OEM storage server platform for IBM XIV, Dell Compellent, LaCie 12Big, Pure FA-300, and several others others - compatibility with various branded disk trays such as NetApp DS4243 and other Xyratex OEM customers mentioned above fit in the array as well)


Note that the <span style="color:red"> power button</span> is just to the inside-front-left (just around the corner from the front-facing LED status lights)
Note that the <span style="color:red"> power button</span> is just to the inside-front-left (just around the corner from the front-facing LED status lights)
Line 39: Line 39:
* [https://www.servethehome.com/lsi-sas-2008-raid-controller-hba-information/ Avago LSI SAS2008 SAS PCIe JBOD Controller] with the following ZFS disk configuration:  
* [https://www.servethehome.com/lsi-sas-2008-raid-controller-hba-information/ Avago LSI SAS2008 SAS PCIe JBOD Controller] with the following ZFS disk configuration:  
** 12-drive ([https://www.hgst.com/sites/default/files/resources/Ultrastar_A7K1000_final_DS.pdf 1TB HGST HUA721010KLA330]) single-pool RAIDZ2 (10TB usable) mounted as /peter
** 12-drive ([https://www.hgst.com/sites/default/files/resources/Ultrastar_A7K1000_final_DS.pdf 1TB HGST HUA721010KLA330]) single-pool RAIDZ2 (10TB usable) mounted as /peter
== Documentation ==
* [[File:HS-1235T-ATX_Quick_Reference_Sheet.pdf]] - Xyratex HS-1235T Kontron Motherboard Quick Reference Sheet (Slot speeds, etc.)
* [[File:User_Manual_12big_Rack_Storage_Server_EN.pdf]] - System User Manual
* [[File:Quick_Install_Guide_12big_Rack_Storage_Server_EN.pdf]] - Xyratex HS-1235T Quick Install Guide (LaCie Branding)


= Build Notes =
= Build Notes =
# These are the notes for the build of Landin (and its functional twin Blanton)
# These are the notes for the build of Landin (and its functional twin [[Equipment/Blanton|Blanton]])
# HW config and notes here: https://wiki.london.hackspace.org.uk/view/Equipment/Landin
# HW config and notes here: https://wiki.london.hackspace.org.uk/view/Equipment/Landin


Line 69: Line 74:
Please note you should add "contrib non-free" after main to the /etc/apt/sources.list for ZFS!
Please note you should add "contrib non-free" after main to the /etc/apt/sources.list for ZFS!


'''iotop htop sudo finger bsdgames ethtool* lynx elinks net-tools openssh-server sudo screen iproute resolvconf build-essential tcpdump vlan ethtool rsync git rdist bzip2 git-core less unzip curl flex bc bison netcat nmap locate vim zsh vim-scripts zfs-dkms zfsutils-linux nfs-kernel-server samba-common-bin qemu-kvm libvirt-clients libvirt-daemon-system libvirt-daemon lshw ipmitool tftpd-hpa apt-mirror smartmontools iozone3 minicom tmux mosh
'''iotop htop sudo finger bsdgames ethtool* lynx elinks net-tools openssh-server sudo screen iproute resolvconf build-essential tcpdump vlan ethtool rsync git rdist bzip2 git-core less unzip curl flex bc bison netcat nmap locate vim zsh vim-scripts zfs-dkms zfsutils-linux nfs-kernel-server samba-common-bin qemu-kvm libvirt-clients libvirt-daemon-system libvirt-daemon lshw ipmitool tftpd-hpa apt-mirror smartmontools iozone3 minicom tmux mosh silversearcher-ag
'''
'''


Line 95: Line 100:
   sudo zpool create -f kinnaman raidz2 /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06BGA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06EWA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG0DJ9A /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ93TMF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9ES2F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9GPHF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J1EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J59F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N1AF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N2TF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N3EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PBJ76D4F  
   sudo zpool create -f kinnaman raidz2 /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06BGA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG06EWA /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAG0DJ9A /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ93TMF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9ES2F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9GPHF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J1EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9J59F /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N1AF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N2TF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PAJ9N3EF /dev/disk/by-id/ata-HITACHI_HUA721010KLA330_PBJ76D4F  


==== ZFS Disk Death - what to do ====
===Proxmox setup===
If a 1 or 2 disks die in the ZFS zpool, you'll want to replace them. You'll see something like a disk or two with the status UNAVAIL and the zpool state being DEGRADED.
We installed Debian Stretch (Debian 9.4.0 at the time) and then followed the [https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Stretch Install Proxmox VE on Debian Stretch] documentation. After that we needed to install the upgraded the ZFS ZED Daemon via apt-get and upgrade our zpool version as well.
We don't want to shut off the computer, so what to do?
 
* Make note of the disk ID(s) and search for those drives by doing
  sudo sas2ircu 0 display | less
* While scrolling up and down using less, you can find the affected dying drive serial number (starts with the letter P in our Hitachi examples)
* Make a note of the enclosure number and the slot number on the controller in the command above.  
* Make the affected disk(s) blink in their slots if you have enclosures that blink properly, such as
  sas2ircu 0 locate 2:1 on #REPLACE WITH THE RIGHT SLOT AND ENCLOSURE ID, DON'T JUST CUT AND PASTE THIS COMMAND AND REPLACE THE WRONG DRIVE. (This is blinking drive 1 in assembly 2)
* then you'll see the blinking slot(s) and can remove those affected disks.
* Replace the drives in the disk trays (you may need a Torx T10 driver or a careful flathead screwdriver to replace drives in the tray, and then reinsert.
* Turn the blinking light off.
  sas2ircu 0 locate 2:1 off
* Find the new drive by either seeing the latest drive added in dmesg and then poking around /dev/disk/by-id for the right serial number. Example disk replacement (remember, use zpool status to find the old disk to replace)
  sudo zpool replace -f peter ata-HITACHI_HUA721010KLA330_PAJ9N3EF ata-HITACHI_HUA721010KLA330_PBJ7DNWE


===Proxmox setup===
# We'll probably just edit LDAP users to be in that group rather than complicate things with local-remote overlays!
# We'll probably just edit LDAP users to be in that group rather than complicate things with local-remote overlays!
# libvirt:x:113: and libvirt-qemu:x:64055:
# libvirt:x:113: and libvirt-qemu:x:64055:
Line 168: Line 160:


=== Networks ===
=== Networks ===
 
* '''bond0''' LACP group of 4 gigabit ethernet interfaces, tagged with VLANs
* '''vmbr0''' - Standard Linux Bridge, bridged to enp2s0. Think of it like an internal switch. Any VM attached to this bridge is effectively attached to the LAN that Landin is connected to.
==== Bridges ====
* '''vmbr0''' - Standard Linux Bridge, bridged to bond0.20. Think of it like an internal switch. Any VM attached to this bridge is effectively attached to the Servers VLAN
* '''vmbr1''' - Standard Linux Bridge, bridged to bond0.30. This is for the cctv network - you probably don't want this one!
* '''vmbr2''' - Standard Linux Bridge, bridged to bond0.10. This is for the management network - you probably don't want this one!


= Current VMs =
= Current VMs =
=== Chomsky ===
=== Chomsky ===
Chomsky is a General purpose system for LHS member usage ( [[IRC|IRC client use]], [[Robonaut]], shell interaction, http://hack.rs/ URL & forwards, light programming tasks, etc.).
* If you are a current London Hackspace member and would like to login to Chomsky, please create and  [https://london.hackspace.org.uk/members/ldap.php enable your LDAP login here].
* Once your LDAP login has been created, use your ssh client and account details to connect to <span style="color: blue;">chomsky.hack.rs</span>.  (We also resolve internally to <span style="color: blue;">chomsky.lan.london.hackspace.org.uk</span>)
* If you have a software package you'd like installed on the system, please engage with any of the maintainers via [[IRC]] or the [[mailing list]] and we'll do the best to accommodate you.


=== ACserver ===
=== ACserver ===


=== Adminstuff ===
=== Adminstuff ===
* Adminstuff serves network admin bits that were original on retired physical host [[denning]], now running Ansible, apt-cacher-ng, tftpboot + pxeboot stuff, NFS server for diskless booting via [[Netboot]].
==== apt-cacher-ng ====


==== apt-cacher-ng ====
Should you want to leverage our local cache for installing the latest Debian/Ubuntu/Raspbian, you can leverage our local proxy.
 
Simply specify '''http://adminstuff.lan.london.hackspace.org.uk:3142''' or have the line
  Acquire::http::Proxy "http://adminstuff.lan.london.hackspace.org.uk:3142/";
 
in something like /etc/apt/apt.conf.d/proxy.conf
 
Remember to delete this file if you take your computer off of the Hackspace network!


==== Redmine ====
==== Redmine ====
Line 185: Line 193:


=== Services ===
=== Services ===
* apt-mirror / apt-cacher-ng (we probably only want one of these)
* TFTP Serving for PXE Boot Support
=== Scheduled Services ===
We use our reasonably equipped data storage and bandwidth to our advantage, especially when synchronising new Ubuntu and Debian variants.
* '''apt-mirror''' syncing at 4AM every morning the following Debian and Debian-derived repositories:
  Debian Unstable main contrib non-free
  Debian Stable main contrib non-free
  Debian Stretch main contrib non-free
  Ubuntu 16.04 main restricted universe multiverse
  UBUNTU 18.04 main restricted universe multiverse
  Raspbian jessie main contrib non-free rpi
  Raspbian stretch main contrib non-free rpi
* ZFS Scrubbing for Data Health & Verification


= How to: =
= How to: =
Line 191: Line 216:
==== Via the web interface ====
==== Via the web interface ====


# Go to  
# Go to to https://landin.lan.london.hackspace.org.uk:8006
# Login with your LDAP credentials
# Click Create VM in the top right corner
# In the general tab, click advanced in the lower right corner and then set the name and check "start at boot"
# In the OS tab, select your desired ISO image in the drop down list and configure the parameters for the guest OS
# In the Storage tab, select a SCSI device, select the storage to the "peter" zpool and entered your desired disk size. Check advanced and also check the "discard" box (Important for thin provisioning)
# In the CPU tab, select your desired number of cores and sockets
# In the memory tab, select your desired size for the RAM
# In the Network tab, select "vmbr0" for the bridge and set the model to "VirtIO"
# In the Confirm tab, check "start after created" and click finish
 
==== Via CLI ====
==== Via CLI ====


Line 202: Line 237:
# Example of a Debian VM with a single core, 512MB of RAM, 10G HDD and connected to the "Bridge" interface <pre>qm create 104 --name "qm-test" --cdrom /var/lib/vz/template/iso/debian-9.4.0-amd64-netinst.iso --memory 512 --cores 1 --net0 "virtio,bridge=vmbr0" --scsi0 "file=peter:10,discard=on,size=10G"</pre>
# Example of a Debian VM with a single core, 512MB of RAM, 10G HDD and connected to the "Bridge" interface <pre>qm create 104 --name "qm-test" --cdrom /var/lib/vz/template/iso/debian-9.4.0-amd64-netinst.iso --memory 512 --cores 1 --net0 "virtio,bridge=vmbr0" --scsi0 "file=peter:10,discard=on,size=10G"</pre>


=== Check RAID array status ====
==== From an existing disk image ====
 
Create a VM from the cli or web as above, no need to start it. Then delete it's disk from the hardware config.
 
Then follow this: http://dae.me/blog/2340/how-to-add-an-existing-virtual-disk-to-proxmox/
 
If the old vm image is stored on ZFS then you'll need to set the disk cache used by proxmox to `writeback`
 
Once the disk appears in the proxmox UI you can add it to the vm and activate it (? Can't quite remember how I did it, but the cache thing is the main thing to know)
 
==== Notes ====
 
There is an apt-cacher-ng setup on landin running on port 10.0.20.10:3142
 
Netbooting should work now, the tftp server and files are on the adminstuff VM. There is a Debian Stretch installer with a preseed config that sets up ssh keys for root for some of the admins.
 
=== RAID Status and How to Blink a Light and Replace a Drive ====
 
Thankfully the system is not in the middle of a woodshop, but the batch of Hitachi 1TB drives are pretty old and we should expect disk failures to happen.  This is an overview of tools available to diagnose the health of the array.
 
==== How is the ZFS Zpool Health, How is the Hardware Health ====
* Very likely you want to see how ZFS sees the drives.  This command should suffice:
  # zpool status -v
* You can check the list of hardware connected to the array via the LSI (Avago/Broadcom) utility '''sas2ircu'''
  # sas2ircu 0 display
(you'll want to pipe this to less or a text file to scroll through the various notes.
* Maybe you want to run through '''smartctl''' and see whether any of the disks are in a pre-fail state. Try a shell script like this:
  for i in {a..o}; do
      echo "Disk sd$i" $SN $MD
      smartctl -i -A /dev/sd$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
  done
 
==== ZFS Disk Death - what to do ====
If a 1 or 2 disks die in the ZFS zpool, you'll want to replace them. You'll see something like a disk or two with the status UNAVAIL and the zpool state being DEGRADED.
We don't want to shut off the computer, so what to do?
* Make note of the disk ID(s) and search for those drives by doing
  # sas2ircu 0 display | less
* While scrolling up and down using less, you can find the affected dying drive serial number (starts with the letter P in our Hitachi examples)
* Make a note of the enclosure number and the slot number on the controller in the command above.
* Make the affected disk(s) blink in their slots if you have enclosures that blink properly. '''DON'T JUST CUT AND PASTE THIS COMMAND AND REPLACE THE WRONG DRIVE BECAUSE YOU MADE THE WRONG SLOT BLINK.''' This example below shows blinking drive 1 in assembly 2:
  # sas2ircu 0 locate 2:1 on
* then you'll see the blinking slot(s) and can remove those affected disks.
* Replace the drives in the disk trays (you may need a Torx T10 driver or a careful flathead screwdriver to replace drives in the tray, and then reinsert.
* Turn the blinking light off.
  # sas2ircu 0 locate 2:1 off
* Find the new drive by either seeing the latest drive added in dmesg and then poking around /dev/disk/by-id for the right serial number. Example disk replacement (remember, use zpool status to find the old disk to replace)
  # zpool replace -f peter ata-HITACHI_HUA721010KLA330_PAJ9N3EF ata-HITACHI_HUA721010KLA330_PBJ7DNWE


(As root):
You can then run <pre>zpool status -v</pre> to see the replacement in progress and a time estimation to finish replacing the old drive in the ZFS array. Nice!
  doodles doodles.
118

edits