Farm HPC cluster status is Operational

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster Login

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster Storage

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster File transfer node

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster high2,med2,low2

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster high,med,low

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster bmh,bmm

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now

Farm HPC cluster bigmemh,bigmemm

Mon 16
Tue 17
Wed 18
Thu 19
Fri 20
Sat 21
Sun 22
now
Last updated 1 minute ago from official status page. Learn more
Stay ahead of Farm HPC cluster outages
Sign up to create a custom dashboard to monitor the services you rely on. 3,000+ services supported.

Active Incidents

Farm's slurmdbd having intermittent issues
Started 24 Apr 2025 00:22:22 (2 months ago), still ongoing
Major Incident
Investigating
Slurm

Farm's slurmdbd is having intermittent issues. If you see an error like below, it means the problem has occurred again, and we will restart slurmdbd to bring it back into service.

"""sacctmgr: error: _open_persist_conn: failed to open persistent connection to host:monitoring-ib:6819: Connection timed out sacctmgr: error: Sending PersistInit msg: Connection timed out"""

We have a support case open with SchedMD and will update this issue as we learn more.

Recently Resolved Incidents

Datacenter Maintenance
Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
Minor Incident
Resolved
Login
Proxmox Virtualization Nodes
Storage
Ganetti cluster
File transfer node
More...

Bi-annual maintenance due to data center generator test and upgrades.

Farm HPC cluster Outage Survival Guide

A step-by-step guide to help you survive a Farm HPC cluster outage
NaN%

    Farm HPC cluster Components

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now

    Farm HPC cluster Login

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster Storage

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster File transfer node

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster high2,med2,low2

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster high,med,low

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster bmh,bmm

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster bigmemh,bigmemm

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster bgpu

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster gpuh,gpum

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster Email

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster Virtualization

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Proxmox Virtualization Nodes
    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Ganetti cluster
    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster Slurm

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Farm's slurmdbd having intermittent issues
    Started 24 Apr 2025 00:22:22 (2 months ago), still ongoing
    Major Incident
    Investigating
    Slurm

    Farm's slurmdbd is having intermittent issues. If you see an error like below, it means the problem has occurred again, and we will restart slurmdbd to bring it back into service.

    """sacctmgr: error: _open_persist_conn: failed to open persistent connection to host:monitoring-ib:6819: Connection timed out sacctmgr: error: Sending PersistInit msg: Connection timed out"""

    We have a support case open with SchedMD and will update this issue as we learn more.

    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.

    Farm HPC cluster Software

    Mon 16
    Tue 17
    Wed 18
    Thu 19
    Fri 20
    Sat 21
    Sun 22
    now
    Datacenter Maintenance
    Started 16 Jun 2025 16:00:37 (7 days ago), resolved 23 Jun 2025 16:38:46 (30 minutes ago)
    Minor Incident
    Resolved
    Login
    Proxmox Virtualization Nodes
    Storage
    Ganetti cluster
    File transfer node
    More...

    Bi-annual maintenance due to data center generator test and upgrades.