Generalised indexing scheme for variable memory layouts

structs are great, the only problem is that not only do they associate data (topo)logically, they also crystallise a memory layout for that set of concepts. I would like to avoid the latter. So I'm implementing a concepts-based system. My concepts are like structs in that they define certain groupings of data, but unlike structs, but they do not strictly define a memory layout, i.e. the ordering of members in that concept. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. This decides the runtime indexer functions which define the memory layout within some adequately sized flat buffer. By changing the indexer, we change the layout, without touching existing user code, and without having to modify structs and (consequently) struct accesses across an entire codebase, since our accessors / indexers are type-agnostic. Type-information Setup My current implementation is presumably similar to any struct parser / compiler setup. We start to crystallise the struct, compacting bitfields, padding and aligning to key memory widths. (This begins to take away some control from indexers, which may be part of my problem.) Define the list of concepts. Where a concept's member is non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree. Extract a set of unique types from the struct tree(s), process them in descending depth order in order to: calculate deeper concept / struct sizes first, so that shallower concepts / structs can have their sizes calculated thereafter, align bitfields into a primitives-only struct header, pad to word and cache line widths and align all fields (header and arrays). Calculate total tree size (after padding, alignment, bitpacking) and allocated a contiguous buffer of bytes / 32-bit integers for this. Extract and flatten all "struct" type's offsets, bit-lengths and bit-widths from the tree into a single, large, contiguous map of values, for fast runtime indexing. Runtime Data Access Data access works, but it's messy - heavy indexing occurs when we need to drill down, e.g. for r = 0..numRegions for i = 0..numIsles for l = 0..numIsles isOpenVal = world.regions[r].isles[i].links[l].isOpen becomes (member sizes, offsets, array factors are accessible inside *Idx functions) worldIdx = 0 //assume world is at start of heap for r = 0..numRegions regionsIdx = worldIdx + membIdx(regions) regionIdx = regionsIdx + elemIdx(world, regions, r) for i = 0..numIsles islesIdx = regionIdx + membIdx(regions) isleIdx = islesIdx + elemIdx(region, isles, i) for l = 0..numLinks linksIdx = isleIdx, membIdx(links) linkIdx = linksIdx + elemIdx(isle, links, l) isOpenIdx = linkIdx, membIdx(isOpen) isOpenVal = heap[isOpen] So, for each level of data access, we have a couple of calls. We cache the partial index in each outer loop so we can reuse it in inner loops. The challenge It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write. I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more). (Partial tree-flattening would be e.g. flattening links into its container isle or region; full flattening would be pushing this right down to the tree root or base of the heap, which here is represented by world, the single struct that contains all other structs.) It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree. But then how to generalise user code indexers? Thoughts There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle above; instead we calculate parts like regionIdx and then share that between all isles within that region (I don't know whether C compilers also do this kind of partial index caching in generated assembly). Interleaving however seems to be the most generalised of all approaches, maybe that is why C uses this approach. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct. The problem

May 6, 2025 - 16:59

structs are great, the only problem is that not only do they associate data (topo)logically, they also crystallise a memory layout for that set of concepts. I would like to avoid the latter.

So I'm implementing a concepts-based system. My concepts are like structs in that they define certain groupings of data, but unlike structs, but they do not strictly define a memory layout, i.e. the ordering of members in that concept. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. This decides the runtime indexer functions which define the memory layout within some adequately sized flat buffer.

By changing the indexer, we change the layout, without touching existing user code, and without having to modify structs and (consequently) struct accesses across an entire codebase, since our accessors / indexers are type-agnostic.

Type-information Setup

My current implementation is presumably similar to any struct parser / compiler setup. We start to crystallise the struct, compacting bitfields, padding and aligning to key memory widths. (This begins to take away some control from indexers, which may be part of my problem.)

Define the list of concepts. Where a concept's member is non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree.
Extract a set of unique types from the struct tree(s), process them in descending depth order in order to:

calculate deeper concept / struct sizes first, so that shallower concepts / structs can have their sizes calculated thereafter,
align bitfields into a primitives-only struct header,
pad to word and cache line widths and
align all fields (header and arrays).

Calculate total tree size (after padding, alignment, bitpacking) and allocated a contiguous buffer of bytes / 32-bit integers for this.
Extract and flatten all "struct" type's offsets, bit-lengths and bit-widths from the tree into a single, large, contiguous map of values, for fast runtime indexing.

Runtime Data Access

Data access works, but it's messy - heavy indexing occurs when we need to drill down, e.g.

for r = 0..numRegions
    for i = 0..numIsles
        for l = 0..numIsles
            isOpenVal = world.regions[r].isles[i].links[l].isOpen

becomes (member sizes, offsets, array factors are accessible inside *Idx functions)

worldIdx = 0 //assume world is at start of heap

for r = 0..numRegions
    regionsIdx = worldIdx + membIdx(regions)
    regionIdx = regionsIdx + elemIdx(world, regions, r)
    
    for i = 0..numIsles
        islesIdx = regionIdx + membIdx(regions)
        isleIdx = islesIdx + elemIdx(region, isles, i)
        
        for l = 0..numLinks
            linksIdx = isleIdx, membIdx(links)
            linkIdx = linksIdx + elemIdx(isle, links, l)

            isOpenIdx = linkIdx, membIdx(isOpen)
            isOpenVal = heap[isOpen]

So, for each level of data access, we have a couple of calls. We cache the partial index in each outer loop so we can reuse it in inner loops.

The challenge

It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write.

I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more).

(Partial tree-flattening would be e.g. flattening links into its container isle or region; full flattening would be pushing this right down to the tree root or base of the heap, which here is represented by world, the single struct that contains all other structs.)

It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree.

But then how to generalise user code indexers?

Thoughts

There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle above; instead we calculate parts like regionIdx and then share that between all isles within that region (I don't know whether C compilers also do this kind of partial index caching in generated assembly).

Interleaving however seems to be the most generalised of all approaches, maybe that is why C uses this approach. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct.

The problem with that is maintenance; i.e. when you decide to change the overall structure used by your app, there's grunt work in modifying structs and code that uses them... common these days, as we move data around between CPU and GPU compatible formats, and optimise for each.

So perhaps I need to continue to use this interleaved style of access, even if, under the hood, the indexers do not necessarily lay the data out like that. That's the tough part to figure out.

The point is that I don't want any compiler (or my own compiler-like code) deciding the layout. By referring to my concepts, I want indexers to do so, without my having to touch either my concepts or my existing user code.

The purpose is to generate numerous different configurations and test their runtime performance to find the best for a given dataset.

How can I generalise indexing in user code, such that it can work with any possible indexing scheme at every level of the tree? Is there a framework for considering this problem?

Generalised indexing scheme for variable memory layouts

Tags:

Related Posts

Popular Posts

Recommended Posts