Debug School

Cover image for Huge Pages
Suyash Sambhare
Suyash Sambhare

Posted on

Huge Pages

Memory is organised in units known as pages. On most systems, a page is 4 Ki. 1Mi of memory is equivalent to 256 pages; 1Gi of memory equals 256,000 pages, and so on. CPUs include a memory management unit that maintains a list of these pages in hardware. The Translation Lookaside Buffer (TLB) is a small hardware cache containing virtual-to-physical page mappings. If the virtual address supplied in a hardware instruction is in the TLB, the mapping can be immediately determined. If not, a TLB miss occurs, and the system reverts to slower, software-based address translation, which causes performance concerns. Because the size of the TLB is fixed, the only option to lessen the possibility of a TLB miss is to increase the page size.

A big page is a memory page that exceeds 4Ki. On x86_64 systems, two frequent large page sizes are 2Mi and 1Gi. Sizes vary across architectures. To use large pages, code must be built such that programs are aware of them. Transparent Huge Pages (THP) attempts to automate the management of large pages without requiring application knowledge, although they have drawbacks. They are specifically limited to 2Mi page sizes. THP can cause performance reduction on nodes with high memory utilisation or fragmentation because of its defragmenting attempts, which can freeze memory pages. As a result, some programs may be designed to or promote the use of pre-allocated large pages rather than THP.

To declare its large page capacity, nodes must first pre-allocate large pages. A node can only pre-allocate large pages of a single size.

Huge pages can be consumed through container-level resource requirements using the resource name hugepages-<size>. Size is the most concise binary notation utilising integer values supported on a given node. If a node supports 2048 KiB page sizes, it exposes the schedulable resource hugepages-2Mi. Unlike CPU or memory, enormous pages do not allow for over-commitment.

apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
spec:
  containers:
  - securityContext:
      privileged: true
    image: rhel7:latest
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        hugepages-2Mi: 100Mi 
        memory: "1Gi"
        cpu: "1"
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
Enter fullscreen mode Exit fullscreen mode

Set the amount of RAM for hugepages as the precise amount to be allocated. This value should not be specified as the amount of RAM for big pages multiplied by the page size. For example, if you have a large page size of 2MB and wish to use 100MB of enormous-page-backed RAM for your program, you will assign 50 huge pages. OpenShift Container Platform does the maths for you. As in the preceding example, you can provide 100MB directly.

Allocating huge pages size

Some platforms offer various large page sizes. To allocate large pages of a given size, use the hugepagesz=<size> selection parameter before the hugepages startup command. The <size> value should be provided in bytes, with the optional scale suffix [kKmMgG]. The default gigantic page size can be set using the default_hugepagesz=<size> startup parameter.

  • Large page requests must match the restrictions. If limitations are defined but requests are not, this is the default.
  • At a pod scope, large pages are segregated. In a later version, container isolation is intended.
  • Huge page memory supporting EmptyDir volumes cannot be used more than the pod request.
  • Programs using shmget() with SHM_HUGETLB to consume enormous pages are required to run with an additional group matching proc/sys/vm/hugetlb_shm_group.

Huge Pages

Configure huge pages at boot

Huge pages used in an OpenShift Container Platform cluster must be pre-allocated by nodes. Huge pages can be reserved in two different ways: during runtime and during bootup. The likelihood of success is higher when you reserve during boot time because the memory has not been severely fragmented yet. At present, the Node Tuning Operator facilitates the allocation of large pages during boot time on designated nodes.

  • Put a label on every node that requires the same large page configuration. $ oc label node <node_using_hugepages> node-role.kubernetes.io/worker-hp=
  • Create a file with the following content and name it hugepages-tuned-boot.yaml:
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: hugepages 
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile: 
  - data: |
      [main]
      summary=Boot time configuration for hugepages
      include=openshift-node
      [bootloader]
      cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50 
    name: openshift-node-hugepages

  recommend:
  - machineConfigLabels: 
      machineconfiguration.openshift.io/role: "worker-hp"
    priority: 30
    profile: openshift-node-hugepages
Enter fullscreen mode Exit fullscreen mode
  • Hugepages should be the Tuned resource's name.
  • Allow large pages to be allocated to the profile section.
  • Keep in mind that the parameters' order matters because some systems allow for large pages in different sizes.
  • Allow matching based on machine configuration pools.
  • Construct the Tuned Hugepages entity. $ oc create -f hugepages-tuned-boot.yaml

Create a file hugepages-mcp.yaml:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-hp
  labels:
    worker-hp: ""
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-hp]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-hp: ""
Enter fullscreen mode Exit fullscreen mode

Create the machine config pool: $ oc create -f hugepages-mcp.yaml

Now that there is sufficient non-fragmented memory available, 50 2Mi enormous pages ought to be allocated to each node in the worker-hp machine configuration pool.

$ oc get node <node_using_hugepages> -o jsonpath="{.status.allocatable.hugepages-2Mi}"
100Mi

Ref: https://docs.openshift.com/container-platform/4.16/post_installation_configuration/node-tasks.html

Top comments (0)