This repository has been archived on 2026-05-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
AgrarianGameArchive/Docs/Ops/DevelopmentInfrastructureRecoveryRunbook.md

180 lines
5.2 KiB
Markdown

# Development Infrastructure Recovery Runbook
## Purpose
This runbook gives a simple recovery path when the Agrarian development machines or shared project storage are unreachable. It covers:
- Unraid `DevBox`
- `Ubuntu-Codex`
- `Windows-Builder`
- The `projects` SMB share
Use the least disruptive recovery path first. Do not reboot `DevBox` until VM-level and service-level checks have failed, because it hosts shared storage and the development VMs.
## Current Baseline
| System | Role | Address / Name | Notes |
| --- | --- | --- | --- |
| `DevBox` | Unraid host, SMB storage, VM host | `192.168.5.8` / `DevBox` | Hosts `projects` share and VMs. |
| `Ubuntu-Codex` | Source-control and automation VM | `192.168.5.10` expected, current host access may also show `192.168.5.6` or `192.168.5.9` depending on interface | Mounts `//192.168.5.8/projects` at `/mnt/projects`. |
| `Windows-Builder` | Unreal/Visual Studio/GPU build VM | `192.168.5.12` | Uses fixed VirtIO MAC `52:54:00:17:ec:5d`. |
| `projects` share | Shared Unreal project storage | `\\DevBox\projects` / `/mnt/projects` | Repo path is `/mnt/projects/AgrarianGameBulid`. |
## First Triage
From Ubuntu-Codex or another LAN machine:
```bash
ping -c 3 192.168.5.8
ping -c 3 192.168.5.12
nc -vz -w 5 192.168.5.12 3389
mount | rg '/mnt/projects|cifs|smb'
```
From `DevBox`:
```bash
virsh list --all
virsh domiflist Windows-Builder
virsh domiflist Ubuntu-Codex
```
Expected VM NIC baselines:
- `Windows-Builder`: bridge `br0`, model `virtio-net`, MAC `52:54:00:17:ec:5d`
- `Ubuntu-Codex`: bridge `br0`, model `virtio-net`, MAC `52:54:00:a5:cf:63`
## Safe Reboot Order
Use this order when multiple systems are unhealthy:
1. Save or stop active work where possible.
2. Restart only the failing service if the host is reachable.
3. Restart the affected VM from inside the guest if guest access works.
4. Use `virsh shutdown <VM>` from `DevBox` if guest access does not work.
5. Use `virsh reboot <VM>` only when a graceful shutdown is not enough.
6. Use `virsh destroy <VM>` only when the VM is hung and no graceful path works.
7. Reboot `DevBox` only after confirming SMB, libvirt, or host networking cannot be recovered individually.
Before planned VM shutdowns, consider a manual VM backup if the change is risky:
```bash
/bin/bash /boot/config/custom/agrarian-vm-backup.sh --shutdown-running --vm Windows-Builder
/bin/bash /boot/config/custom/agrarian-vm-backup.sh --shutdown-running --vm Ubuntu-Codex
```
## Windows-Builder Recovery
Use these in order:
1. Confirm the VM is running:
```bash
virsh domstate Windows-Builder
virsh domiflist Windows-Builder
```
2. Confirm RDP is listening from Ubuntu-Codex:
```bash
nc -vz -w 5 192.168.5.12 3389
```
3. Use the QEMU guest-agent path before relying on RDP when possible.
4. If RDP is down but guest commands work, check:
```powershell
Get-Service TermService
Get-NetConnectionProfile
Get-NetFirewallRule -DisplayGroup "Remote Desktop"
```
5. Restart RDP only if it is not listening:
```powershell
Restart-Service TermService -Force
```
6. If Unreal visual inspection is needed, use Sunshine/Moonlight instead of RDP.
Detailed Windows-Builder references:
- `Docs/Ops/WindowsBuilderNetworkRdpStability.md`
- `Docs/Ops/WindowsBuilderGpuRemoteAccess.md`
## Ubuntu-Codex Recovery
Use these in order:
1. Confirm VM state from `DevBox`:
```bash
virsh domstate Ubuntu-Codex
virsh domiflist Ubuntu-Codex
```
2. Confirm SSH or console access.
3. Confirm the project mount:
```bash
mount | rg '/mnt/projects'
ls -la /mnt/projects/AgrarianGameBulid
```
4. If `/mnt/projects` is missing, remount the SMB share using the existing system mount configuration rather than creating a new ad hoc mount.
5. Confirm Git access:
```bash
git -C /mnt/projects/AgrarianGameBulid status --short
git -C /mnt/projects/AgrarianGameBulid remote -v
```
Do not wipe local changes to recover the VM. Preserve uncommitted work first with a commit, patch, or backup copy outside the repo.
## DevBox And SMB Recovery
Use these in order:
1. Confirm the host is reachable:
```bash
ping -c 3 192.168.5.8
```
2. Confirm Unraid services and VM state through the Unraid UI or SSH.
3. Confirm the `projects` share is visible:
```bash
smbclient -L //192.168.5.8 -N
```
4. Confirm Ubuntu-Codex sees the share mounted:
```bash
mount | rg '/mnt/projects'
```
5. If DNS name `DevBox` fails but IP works, use the IP temporarily and repair the router/DNS record later.
6. Avoid storing project files directly on the Unraid OS boot filesystem. Project data belongs on the `projects` share or in VMs.
## When To Stop And Inspect Before Rebooting
Pause before rebooting if:
- A build, package, backup, or VM disk copy is running.
- Unreal Editor is open with unsaved assets.
- Git has uncommitted changes that are not understood.
- A backup or restore test is in progress.
- DevBox has disk or array warnings.
## After Recovery
After any VM or `DevBox` recovery:
1. Confirm `/mnt/projects/AgrarianGameBulid` is reachable.
2. Run `git status --short`.
3. Confirm Windows-Builder RDP with `nc -vz -w 5 192.168.5.12 3389`.
4. Confirm Sunshine only if visual inspection is needed.
5. Record unusual recovery steps in the handoff notes or the relevant ops doc.