Hyper-converged Infrastructure as a Service (HCIaaS)

Table of Contents

    • Paper Goals
    • MaaS Limitations
    • Proposed Solution
    • Few studies show
    • PVE HCI Main Advantages
    • VMs and Containers
    • Community Enterprise OS
    • Demo Requirements
    • FAQ
    • Conclusion
    • More information

Paper Goals

    1. Move High Performance Compute hosts from Machine as a Service (MaaS) to a Hyper-converged Infrastructure as a Service (HCIaaS).
    2. Move VMs and containers that use centralized storage over iSCSI to a Hyper-converged Infrastructure as a Service to save space, energy, licensing, and boost performance.

Why HCIaaS?

    • Will transform current MaaS Infrastructure into infrastructure as a code.
    • Add tremendous flexibility, agility, and high availability providing up to 99.999% SLA.
    • Boosts storage performance up to 5 times using M.2 interface and NVMe storage.
    • Provides full control over Intellectual Property for encryption and secure access.
    • It uses commodity hardware and open source codes based on GPL Licensing.
    • Consumes less space, less energy which will save the businesses a lot of money on electricity, space, licensing fees, and third party support subscriptions.
    • It can be built in-house or in a cloud based MaaS such OVH provider servers.

MaaS Limitations

If HPC physical hosts are being utilized as MaaS, then performance, scalability, elasticity, agility, and high availability are very limited or maybe not available, hence services deliver-ability and business deadlines are often impacted due to the nature of MaaS limitation.

Proposed Solution

ProxMox suite or ProxMox Virtual Environment (PCE) uses Kernel Virtualization Machine (KVM). It’s the technology will be used at the core of PVE HCIaaS to manage the HCIaaS abstraction layer. PVE/Ceph Suite famously known to be a perfect drop replacement for known expensive solutions such vSphere suite or hyper-V, especially when the environment mostly using GNU/Linux systems. PVE/Ceph suite OS is based on the most stable and secure GNU/Linux distro – Debian.

No need for Centralized Storage: With PVE/Ceph Suite HCI solution, VM Datastores will be distributed among hypervisor nodes, hence compute and storage resources will be sliced and distributed among hypervisors. Therefore, there will be No need for SAN centralized storage such as NFS/iSCSI datastores to run the compute VMs. Note: services like NFS Mounts will still can be used the way being used but connected from VM instead of physical server.

Recommended approach: build separate infrastructure for Hyper-converged compute and little by little migrate compute resources as Virtual compute hosts to the new Hyper-Converged IaaS.

Few studies show

Ceph and KVM studies showed how switching to HCI solution not only increase the performance of servers tremendously, but will also save the company money on hardware and software licensing. Below, is a very interesting study which will show how High-Performance Computing nodes using Ceph can take advantage of low-profile hardware to save thousands and millions of dollars. (For example: No expensive RAID Controllers required anymore).

The study shows how low-profile machines can outperform high-end centralized storage. Nowadays, especially when NVMe M.2 interface is being used, it will boost read/write access up to 5 times faster than serial ATA or NFS/iSCSI read/write access.

http://cdn.opensfs.org/wp-content/uploads/2013/04/Weil-Ceph-20130416-LUG.pdf

PVE HCI Main Advantages

    • GNU/Linux: GPL Licensee-based, GNU/Debian OS, completely Libre/Free Source. No License fees or vendor lock, no restrictions or proprietary software at the source level. Architect Team can adjust and modify the HCIaaS as they wish to fit company’s business needs.
    • HCI Technology: ProxMox HCI tightly integrates compute, storage, and networking resources to manage highly available clusters, backup/restore, as well as disaster recovery. All components are software-defined and compatible with one another. Therefore, it is possible to administrate them like a single system via centralized web interface.
    • Web Management: With PVE web interface, no client software to install or centralized management software like vCenter is ever needed. PVE can be used as single node or assemble a cluster cell over HTML5 interface. All tasks and compute resource are done using web interface or bash to manage Infrastructure as a code. Especially, when a tool like https://www.terraform.io/ is being used. More Info…
    • Restful API: JSON is the primary data format and the whole API is formally defined using JSON Schema to enable fast and easy integration with third party management tools like custom hosting environments.
    • Security: Web interface can be configured to use Let’s Encrypt SSL certificates and Multi Factor Authentication 2FA for secure login.
    • LDAP/AD Support: PVE supports multiple authentication sources like Active Directory, LDAP, Linux PAM standard authentication or the built-in Proxmox VE authentication server.
    • NUMA Support: Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing. It might be very crucial for Chip Engineering tools. More Info…
    • Commodity HW: Low profile hardware nodes with decent CPUs and acceptable memory area is all needed to run PVE/Ceph suite.
    • Storage Technology: Works great with latest storage technology such as M.2 NVMe for Super lightning fast IOPS, read/write access can be up to 3.2GB/s per each Virtual Compute Host/VM vs 600MB/s for physical host using Serial ATA.
    • Ceph Cluster: With its supper IO performance, ceph distributes Datastores among PVE Hypervisors nodes for redundancy, hence allows features like HA and Live migration.
    • CephFS: Provides clustered highly available shared filesystem. Its Metadata Servers guarantee that files get balanced out over the whole Ceph cluster, this way even high load nodes will not overload a single node, which can be an issue with traditional shared filesystem approaches like NFS. PVE supports both 1) existing CephFS as storage to save backups, ISO files, container templates, 2) to create a hyper-converged CephFS itself.
    • Old Guest Support: Allows OLD GNU/Linux systems like SLES 11/12 or SLED 11/12 to run smoothly without dealing with trimming requirements which is needed by new technology such as NVMe storage. Guest VMs will see their virtual hard drives as regular ATA drives, hence can use any filesystem and won’t know about NVMe storage layer.
    • Golden Images: As VM or LXC, golden images can be used to roll compute VMs on demands in a matter of seconds, roll thousands of virtual machines in no time vs. days waiting for MaaS team to deploy physical servers, or waiting for imaging script to be done.
    • Memory Boost: Looking for more memory, VMs can use local NVMe as swap to boost memory when needed.
    • Thin provisioning: Virtual machines can utilize the thin provisioning technology to snapshot VMs or LXC containers, hence making backup and restore a snap.
    • No HW RAID: PVE OS is Debian and can use Software RAID1. The rest of the NVMe storage chips will be just JBOD, installed at the 64bit slots using cards or using the built-in bays. All NVMe chips will be managed by ceph file system across hypervisors for redundancy and High availability without HW RAID similar to ZFS.
    • Central Management Interface
    • https://pve.Proxmox.com/pve-docs/pve-admin-guide.html#_central_management
    • Features of PVE
    • https://www.Proxmox.com/en/Proxmox-ve/features
    • Compare with other Locked Vendors
    • https://www.Proxmox.com/en/Proxmox-ve/comparison
    • PVE Container Ready Linux Templates
    • http://download.Proxmox.com/images/system/

VMs and Containers

Perfect fit for VMs and containers. GNU/Linux instances can run inside containers if wished instead of VMs, which will save so much resources and allow over-subscriptions due to very low footprint compared to a VMs. Containers are 10 times lighter and faster than VMs, hence the hypervisor can fit more instances. LXC setup by PVE/Ceph is a snap which makes it a good match for GNU/Linux VMs/Containers and can be controlled using RESTful API using a web interface.

OpenText and NoMachine drop replacement

ProxMox allows VMs to use HTML5 or Spice client. Hence GNU/Linux instances can be used over secure socket layer SSL web interface, or use Spice client which can be installed at windows laptop https://www.spice-space.org/ to connect to either GNU/Linux machine or Windows machine.

Community Enterprise OS

The company can save more money by switching to Suse Leap 15.0 Enterprise Free of charge version. Suse Leap 15.0 Enterprise can be tweaked tested then deployed across all company Linux instances, then deployed at the server side for HPC VM compute hosts if wished. The company Hardware and software engineers will benefit tremendously from the latest Suse leap Enterprise technology; eventually all engineering tools can be tweaked and migrated from SLED/SLES 11/12 to OpenSUSE Leap 15.0 Enterprise free version.

SLES/SLED vs OpenSuse Leap 15.0

The relationship between SLED/SLES 11.4/12.4 versions and the OpenSuse Leap 15.0 Enterprise is like the one between Red Hat and CentOS. Apart from Red Hat support subscription, there is no difference between Red Hat OS Enterprise and CentOS Enterprise except the logo. Suse finally got it and started the Leap 15.0 Enterprise route to compete with Red hat and Ubuntu. Leap stability Rocks across desktops and Servers using modern greatest GNU/Linux and Open ecosystems technology. More Info…

Download link: https://software.opensuse.org/distributions/leap

Demo Requirements

A PVE cluster cell requires at least 3 nodes. 5 or 7 nodes are preferred to start the Demo test.

Hardware: Supermicro servers with the following specs will do the job. It’s preferred with built-in M.2 interface bays (at least 12 bays per hypervisor), or few of x16 PCI Express slots. For Example, here is a few years old SuperMicro Server.

SuperMicro Specs:

    • Model: SYS-7048GR-TR
    • CPU Type: Intel Xeon E5-2600 v4 / v3 family
    • CPU Socket Type: Dual LGA 2011 North Bridge Intel C612
    • Memory Slots: 16 x 288Pin – Up to 2TB ECC 3DS LRDIMM, 1TB ECC RDIMM
    • PCI Express: 4 x PCI-E 3.0 x16 (double-width)
    • SATA Ports: For Debian OS, 2 SSD Drives.

PVE/Ceph Requirements:

    • CPUs: at least 2 Intel CPUs, 24 physical cores each with threading capability
    • RAM: At least 128GB or 256G per Hypervisor Node
    • NVMe: The smaller and more NVMe chips the better. Best practices: More with smaller size, 12 NVMe chips, each 256G for a total of 3TB of storage Ceph OSDs to use. NVMe can be installed using NVMe built-in bay or x16 PCI Express Cards.
    • Network: 2 or 4 Fiber/Ethernet NIC cards with 10Gb/s 2 ports each. One for Proxmox/Ceph communication and the 2nd NIC for LSF VM hosts subnet.
    • PCI Express: 4 or 6 x PCI-E 3.0 x16 slots. It depends on Server’s M.2 number of bays and built-in NIC numbers and speed. If no M.2 bays and No built-in 10G/s NIC cards, then x16 PCI Express slots can be used.

Software Requirements:

    • GNU/Linux Debian Distro.
    • PVE Suite to manage KVM as HCIaaS.
    • Ceph for storage distribution among the Hypervisors nodes. (it’s Part of PVE suite)

FAQ

How to compare M.2 interface to SATA or SAN cards and how it’s 5 Times faster?

Current serial ATA interface speed is 6Gb/s interface rate speed (newer is 10Gb/s). Whether SAS, SATA, or even SSD drive is being used, drives are limited to their interface speed. NVMe uses M.2 interface with 32Gb/s interface rate speed, which allows the NVMe media storage to have read/write access at rate of 3.2GB/s vs. 600MB/s for serial ATA Interface. Therefore, 3.2GB/s / 600MB/s = 5 times faster than Serial ATA read/Write access performance. Some experiments showed even 7 times faster than SSD Serial ATA drives. More Info…

So, 10Gb/s SAN card which provides 1GB/s read/write access not even close to 3.2GB/s. In a cluster setup, even faster access is achievable. When Ceph SODs are being used over multiple NVMe chips (Minimum 12 chips) per Hypervisor node, it will blast the performance to the roof, simply because the NVMe chips will act like a RAID10 and RAID50 performance. It will simply saturate the motherboard bus speed: 12 chips * 3.2GB/s = 38.4GB/s read/write access, hence adding the M.2 ports together will provide 384Gb/s interface rate speed! SAN Fiber can’t even come close even with the most expensive Fabric.

Will NFS or iSCSI ever be needed with PVE/Ceph to run VMs?

When PVE with ceph distributed storage are being used, NFS Datastores or centralized storage will never be required to run the compute VM hosts or LXC containers. The PVE Hypervisor nodes will use ceph distributed storage among them, hence all VMs and containers will run using Local NVMe storage, sliced and managed by ceph file systems, which in return distribute the datastores volumes among all hypervisors for redundancy, high availability, and enabling features like HA and Live migration.

Can we consider PVE/Ceph mature the way SAN/IaaS as EMC/VMware?

Ceph distributed file system matured few years ago and especially among KVM hypervisor nodes for HCIaaS solutions. Rest assured that the technology is solid stable, and what made Ceph in high demand lately, is the invention of M.2 interface for NVMe storage, which made HCIaaS solution like PVE way better solution than SAN fabrics. Nowadays, both technologies PVE/Ceph suite and NVMe are matured, stable, and tested by thousands of businesses around the world.

How to compare SAN IaaS to HCIaaS performance?

SAN Server 10G/s interface Fiber storage card never added that much to the table compared to the local 6Gb/s or 10G/s Serial ATA interface. And once more and more of Virtual machines are added to NFS Datastores, performance eventually will degrade due to the fact of SAN core designed that is “Centralized Storage” where the bottle neck eventually will start.

When performance is the main requirement, it’s highly recommended to stay away from centralized storage if possible, even expensive SAN Fabrics can’t compete with HCI ceph local distributed storage, simply because SAN can’t distribute local storage among hypervisor nodes. Besides, CPU, memory, and NUMA technology faster when dealing with local access storage.

What are the main advantages of using PVE/Ceph over SAN/IaaS solutions?

I believe the main advantage of using SAN with IaaS is not performance as much as allowing IaaS solution to use features like High availability, Fault Tolerance, Auto Scalability, Elasticity, Live migration, and much more at the Virtual machine level, where in MaaS was never possible at the physical machine level.

The main advantages of HCI solution like PVE/Ceph is to provide the IaaS solution with same services that SAN provided to IaaS, but with distributed storage design vs. centralized (No bottle Neck), HCI design, load balanced among hypervisor nodes, unbeatable performance (thanks to M.2 interface and NVMe), more compute resources per Hypervisor Node, stability, scalability, less space required, affordable commodity HW, and no Licensing fees. All this will save the company tones of money, labor hours, and energy compared to IaaS using SAN.

Conclusion

EMC, VMWare, Oracle, and NetApp don’t have an equal HCI solution for IaaS the way PVE/Ceph HCI suite does, simply because 1) their business model is based on Licensing fee and Vendor lock software, 2) they will always push their expensive centralized SAN storage due to the big amount of money they make from licensing fees and support subscriptions, 3) if one day they do provide such HCI solution, it will be extremely expensive due to licensing fee.

As you can see, PVE/Ceph suite is not only suitable for HPC compute hosts as VMs, it is even more suitable to manage the GNU/Linux and Windows instances as well, which will save the company even more money on licensing and compute resources, with no restrictions, limits, or vendor lock to source code. Besides, all using GPL Libre/Free licensing and software.

As fsf.org member, I will be so glad to see companies one day utilizing the fastest, greatest, and most stable GNU/Linux technologies and solutions mainly for the following:

    • To support the Libre/Free Software Foundation, GNU ecosystems, and GPL based licensing – they deserve all our support.
    • To save business millions of dollars in licensing and support subscriptions.
    • It will allow us as GNU/Linux IT Engineers to be more involved in our in-house infrastructure solutions, which in return leverage and enhance our GNU/Linux knowledge and experience using Libre ecosystems Software packages.

Finally, Enterprise GNU/Linux solutions such as KVM, Ceph, and Proxmox are all tested, proven, solid stable, and reliable technology. If support is needed, it’s always available at any time with affordable subscription from PVE and Ceph.

More information

HCI Explained

Please note: Gridstore was either acquired or transitioned to HyperGrid with totally different business model, however, I have their video and pdf to only explain HCI technology.

PVE Admin Guide and Mastering Book

PVE Video Training

PVE YouTube Training

Guides and Studies

Ceph Introduction

Ceph CRUSH Deep Dive

M.2 Interface Explained

Last modified: July 20, 2020