I recently set up huge pages on my database server (MariaDB and Postgres) the recommended way, which required way too much rigamarole IMO. Add to the kernel command line to statically allocate a number of huge pages of a certain size. Create a group to access huge pages. Configure that group to access huge pages. Add mysqld etc to the group. Configure the huge pages to be mounted as a virtual filesystem in /dev/ for some reason. Add corresponding configuration to the database to tell it to use huge pages and where to get them.
This should all just be a single boolean flag in the database config telling it to use huge pages which it gets from mmap dynamically. Why is any of the filesystem, permission, static allocation malarkey necessary?
Huge pages need contiguous free physical pages. Without preallocating them at boot time, with higher system uptime, chances of finding such region to satisfy the allocation are slimmer, especially for 1G pages, to the point when even services starting later at boot time might not get them due to external fragmentation caused by 4k pages allocations.
While I can see why special permissions are needed to grab them, the whole filesystem thingy is clunky as hell. I have no idea why they didn't put them by default in /sys or /proc.
> This should all just be a single boolean flag in the database config telling it to use huge pages which it gets from mmap dynamically. Why is any of the filesystem, permission, static allocation malarkey necessary?
FWIW, those bits shouldn't be necessary with postgres. If huge_pages is try or on, we'll specify MAP_HUGETLB (or (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT if huge_page_size is set). If mmap() fails we'll error (=on) out or fall back to non-huge allocation (=try)
However, you do need to allocate huge pages on the system level for this to succeed. But it's indeed just a /proc (or /sys if you want more control). /proc/sys/vm/nr_hugepages, or /sys/kernel/mm/hugepages/hugepages-kB/nr_hugepages.
One of the more awkward bits about the kernel config is that they are calculated in pages, so you need to do the conversion yourself :(
This should all just be a single boolean flag in the database config telling it to use huge pages which it gets from mmap dynamically. Why is any of the filesystem, permission, static allocation malarkey necessary?