I see two large inefficiencies to condos and one lost strategic advantage:
- Capital utilization is low due to the private nature of each condo.
- Due to sunk cost of hardware purchased frivolous utilization is encouraged.
- The lost advantage is in flexibility of allocation for emergencies and competitive advantage.
The result is almost always access to the nodes of the condo only being allowed to those who are members of it and access to those who are not is non-existent or severally limited. This leads to low utilization due to a lack of diversity in workload inside condos. This consumes data center space network ports etc for machines that are on average 50% idle.
Because researchers pay the full cost upfront, and these nodes are not always utilized the marginal cost of running a job on them that is very low priority, or even frivolous, unwilling to be funded by any agency etc, are allowed to run. If the hardware is there why not?
I actually think this is a good thing for condos. If marginal cost is low you might as well utilize the hardware. My proposed replacement puts the squeeze on these sort of marginal work.
Condos are slow changing and static. They also have a high start up costs, as researchers own the hardware the upfront cost of the hardware is large and one time. This pushes out small work that could benefit to limit access to HPC resources. If a group could rent just 100 cores for a week it would cost less than 1 machine and provide greater benefit. Condos also do not allow for bursting, the idea that resources are needed quickly in an emergency situation.
In a later post I will put down my thoughts as to a solution to these problems.
No comments:
Post a Comment