180 lines
5.2 KiB
Markdown
180 lines
5.2 KiB
Markdown
# Development Infrastructure Recovery Runbook
|
|
|
|
## Purpose
|
|
|
|
This runbook gives a simple recovery path when the Agrarian development machines or shared project storage are unreachable. It covers:
|
|
|
|
- Unraid `DevBox`
|
|
- `Ubuntu-Codex`
|
|
- `Windows-Builder`
|
|
- The `projects` SMB share
|
|
|
|
Use the least disruptive recovery path first. Do not reboot `DevBox` until VM-level and service-level checks have failed, because it hosts shared storage and the development VMs.
|
|
|
|
## Current Baseline
|
|
|
|
| System | Role | Address / Name | Notes |
|
|
| --- | --- | --- | --- |
|
|
| `DevBox` | Unraid host, SMB storage, VM host | `192.168.5.8` / `DevBox` | Hosts `projects` share and VMs. |
|
|
| `Ubuntu-Codex` | Source-control and automation VM | `192.168.5.10` expected, current host access may also show `192.168.5.6` or `192.168.5.9` depending on interface | Mounts `//192.168.5.8/projects` at `/mnt/projects`. |
|
|
| `Windows-Builder` | Unreal/Visual Studio/GPU build VM | `192.168.5.12` | Uses fixed VirtIO MAC `52:54:00:17:ec:5d`. |
|
|
| `projects` share | Shared Unreal project storage | `\\DevBox\projects` / `/mnt/projects` | Repo path is `/mnt/projects/AgrarianGameBulid`. |
|
|
|
|
## First Triage
|
|
|
|
From Ubuntu-Codex or another LAN machine:
|
|
|
|
```bash
|
|
ping -c 3 192.168.5.8
|
|
ping -c 3 192.168.5.12
|
|
nc -vz -w 5 192.168.5.12 3389
|
|
mount | rg '/mnt/projects|cifs|smb'
|
|
```
|
|
|
|
From `DevBox`:
|
|
|
|
```bash
|
|
virsh list --all
|
|
virsh domiflist Windows-Builder
|
|
virsh domiflist Ubuntu-Codex
|
|
```
|
|
|
|
Expected VM NIC baselines:
|
|
|
|
- `Windows-Builder`: bridge `br0`, model `virtio-net`, MAC `52:54:00:17:ec:5d`
|
|
- `Ubuntu-Codex`: bridge `br0`, model `virtio-net`, MAC `52:54:00:a5:cf:63`
|
|
|
|
## Safe Reboot Order
|
|
|
|
Use this order when multiple systems are unhealthy:
|
|
|
|
1. Save or stop active work where possible.
|
|
2. Restart only the failing service if the host is reachable.
|
|
3. Restart the affected VM from inside the guest if guest access works.
|
|
4. Use `virsh shutdown <VM>` from `DevBox` if guest access does not work.
|
|
5. Use `virsh reboot <VM>` only when a graceful shutdown is not enough.
|
|
6. Use `virsh destroy <VM>` only when the VM is hung and no graceful path works.
|
|
7. Reboot `DevBox` only after confirming SMB, libvirt, or host networking cannot be recovered individually.
|
|
|
|
Before planned VM shutdowns, consider a manual VM backup if the change is risky:
|
|
|
|
```bash
|
|
/bin/bash /boot/config/custom/agrarian-vm-backup.sh --shutdown-running --vm Windows-Builder
|
|
/bin/bash /boot/config/custom/agrarian-vm-backup.sh --shutdown-running --vm Ubuntu-Codex
|
|
```
|
|
|
|
## Windows-Builder Recovery
|
|
|
|
Use these in order:
|
|
|
|
1. Confirm the VM is running:
|
|
|
|
```bash
|
|
virsh domstate Windows-Builder
|
|
virsh domiflist Windows-Builder
|
|
```
|
|
|
|
2. Confirm RDP is listening from Ubuntu-Codex:
|
|
|
|
```bash
|
|
nc -vz -w 5 192.168.5.12 3389
|
|
```
|
|
|
|
3. Use the QEMU guest-agent path before relying on RDP when possible.
|
|
4. If RDP is down but guest commands work, check:
|
|
|
|
```powershell
|
|
Get-Service TermService
|
|
Get-NetConnectionProfile
|
|
Get-NetFirewallRule -DisplayGroup "Remote Desktop"
|
|
```
|
|
|
|
5. Restart RDP only if it is not listening:
|
|
|
|
```powershell
|
|
Restart-Service TermService -Force
|
|
```
|
|
|
|
6. If Unreal visual inspection is needed, use Sunshine/Moonlight instead of RDP.
|
|
|
|
Detailed Windows-Builder references:
|
|
|
|
- `Docs/Ops/WindowsBuilderNetworkRdpStability.md`
|
|
- `Docs/Ops/WindowsBuilderGpuRemoteAccess.md`
|
|
|
|
## Ubuntu-Codex Recovery
|
|
|
|
Use these in order:
|
|
|
|
1. Confirm VM state from `DevBox`:
|
|
|
|
```bash
|
|
virsh domstate Ubuntu-Codex
|
|
virsh domiflist Ubuntu-Codex
|
|
```
|
|
|
|
2. Confirm SSH or console access.
|
|
3. Confirm the project mount:
|
|
|
|
```bash
|
|
mount | rg '/mnt/projects'
|
|
ls -la /mnt/projects/AgrarianGameBulid
|
|
```
|
|
|
|
4. If `/mnt/projects` is missing, remount the SMB share using the existing system mount configuration rather than creating a new ad hoc mount.
|
|
5. Confirm Git access:
|
|
|
|
```bash
|
|
git -C /mnt/projects/AgrarianGameBulid status --short
|
|
git -C /mnt/projects/AgrarianGameBulid remote -v
|
|
```
|
|
|
|
Do not wipe local changes to recover the VM. Preserve uncommitted work first with a commit, patch, or backup copy outside the repo.
|
|
|
|
## DevBox And SMB Recovery
|
|
|
|
Use these in order:
|
|
|
|
1. Confirm the host is reachable:
|
|
|
|
```bash
|
|
ping -c 3 192.168.5.8
|
|
```
|
|
|
|
2. Confirm Unraid services and VM state through the Unraid UI or SSH.
|
|
3. Confirm the `projects` share is visible:
|
|
|
|
```bash
|
|
smbclient -L //192.168.5.8 -N
|
|
```
|
|
|
|
4. Confirm Ubuntu-Codex sees the share mounted:
|
|
|
|
```bash
|
|
mount | rg '/mnt/projects'
|
|
```
|
|
|
|
5. If DNS name `DevBox` fails but IP works, use the IP temporarily and repair the router/DNS record later.
|
|
6. Avoid storing project files directly on the Unraid OS boot filesystem. Project data belongs on the `projects` share or in VMs.
|
|
|
|
## When To Stop And Inspect Before Rebooting
|
|
|
|
Pause before rebooting if:
|
|
|
|
- A build, package, backup, or VM disk copy is running.
|
|
- Unreal Editor is open with unsaved assets.
|
|
- Git has uncommitted changes that are not understood.
|
|
- A backup or restore test is in progress.
|
|
- DevBox has disk or array warnings.
|
|
|
|
## After Recovery
|
|
|
|
After any VM or `DevBox` recovery:
|
|
|
|
1. Confirm `/mnt/projects/AgrarianGameBulid` is reachable.
|
|
2. Run `git status --short`.
|
|
3. Confirm Windows-Builder RDP with `nc -vz -w 5 192.168.5.12 3389`.
|
|
4. Confirm Sunshine only if visual inspection is needed.
|
|
5. Record unusual recovery steps in the handoff notes or the relevant ops doc.
|
|
|