This new implication of the physical HW constraints with the programming design are this you should never index dynamically round the hardware data: an enroll document can essentially not noted dynamically. Simply because the latest check in number is restricted plus one both must unroll explicitly to acquire fixed check in quantity or wade because of thoughts. This might be a limitation common so you’re able to CUDA coders: whenever claiming an exclusive float good ; and you may subsequently indexing that have an energetic really worth leads to therefore-titled regional memories incorporate (i.elizabeth. roundtripping so you can memories).
Implication with the codegen ¶
Which brings up the consequences for the static versus dynamic indexing chatted about in earlier times: promo kÃ³dy alt extractelement , insertelement and you can shufflevector into n-D vectors during the MLIR only support static indices. Active indicator are only supported into the very small 1-D vector yet not the brand new outer (n-1)-D . Some other circumstances, direct weight / places are expected.
- Loops up to vector values is indirect addressing of vector values, they need to operate on explicit weight / store functions over n-D vector versions.
- After a keen n-D vector form of try piled toward an enthusiastic SSA worth (that can or will most likely not live in n data, that have otherwise in place of spilling, whenever sooner decreased), it could be unrolled to smaller k-D vector items and operations that match brand new HW. Which quantity of MLIR codegen is related to sign in allotment and you may spilling one to are present far later on on LLVM tube.
- HW get support >1-D vectors that have intrinsics to possess indirect approaching during these vectors. These could end up being focused thanks to specific vector_shed operations out of MLIR k-D vector systems and operations to LLVM step one-D vectors + intrinsics.
Alternatively, we believe actually reducing so you can a beneficial linearized abstraction hides out the codegen complexities associated with memories accesses by giving a false effect out-of magical vibrant indexing round the records. Rather i always build people really specific for the MLIR and you may allow codegen to understand more about tradeoffs. More HW requires additional tradeoffs on brands doing work in measures step one., dos. and you will 3.
Behavior generated from the MLIR level are certain to get implications during the a great much later stage in LLVM (shortly after register allowance). We really do not imagine to reveal concerns about acting away from sign in allocation and you will spilling in order to MLIR clearly. Alternatively, for each and every address usually introduce a couple of “good” target operations and letter-D vector systems, with the can cost you you to PatterRewriters during the MLIR level is able to address. Instance will set you back from the MLIR level is conceptual and you can made use of for ranking, maybe not to have specific abilities acting. Later on such as for instance can cost you could be read.
Implication to the Reducing to help you Accelerators ¶
To target accelerators that support higher dimensional vectors natively, we can start from either 1-D or n-D vectors in MLIR and use vector.cast to flatten the most minor dimensions to 1-D vector
It is the role of an Accelerator-specific vector dialect (see codegen flow in the figure above) to lower the vector.cast . Accelerator -> LLVM lowering would then consist of a bunch of Accelerator -> Accelerator rewrites to perform the casts composed with Accelerator -> LLVM conversions + intrinsics that operate on 1-D vector
Some of those rewrites may need extra handling, especially if a reduction is involved. For example, vector.cast %0: vector
However vector.cast %0: vector