awesome-reviewers/_reviewers/llvm-project-optimize-computational-complexity.json

[
  {
    "discussion_id": "2193701526",
    "pr_number": 147588,
    "pr_file": "llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp",
    "created_at": "2025-07-09T01:10:54+00:00",
    "commented_code": "NarrowSearchSpaceByPickingWinnerRegs();\n }\n \n+/// Sort LSRUses to address side effects of compile time optimization done in\n+/// SolveRecurse which filters out formulae not including required registers.\n+/// Such optimization makes the found best solution sensitive to the order\n+/// of LSRUses processing, hence it's important to ensure that that order\n+/// isn't random to avoid fluctuations and sub-optimal results.\n+///\n+/// Also check that all LSRUses have formulae as otherwise the situation is\n+/// unsolvable.\n+bool LSRInstance::SortLSRUses() {\n+  SmallVector<LSRUse *, 16> NewOrder;",
    "repo_full_name": "llvm/llvm-project",
    "discussion_comments": [
      {
        "comment_id": "2193701526",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 147588,
        "pr_file": "llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp",
        "discussion_id": "2193701526",
        "commented_code": "@@ -5368,6 +5369,46 @@ void LSRInstance::NarrowSearchSpaceUsingHeuristics() {\n     NarrowSearchSpaceByPickingWinnerRegs();\n }\n \n+/// Sort LSRUses to address side effects of compile time optimization done in\n+/// SolveRecurse which filters out formulae not including required registers.\n+/// Such optimization makes the found best solution sensitive to the order\n+/// of LSRUses processing, hence it's important to ensure that that order\n+/// isn't random to avoid fluctuations and sub-optimal results.\n+///\n+/// Also check that all LSRUses have formulae as otherwise the situation is\n+/// unsolvable.\n+bool LSRInstance::SortLSRUses() {\n+  SmallVector<LSRUse *, 16> NewOrder;",
        "comment_created_at": "2025-07-09T01:10:54+00:00",
        "comment_author": "arsenm",
        "comment_body": "Why do you need this temporary vector? Can't you sort Uses in place? ",
        "pr_file_module": null
      },
      {
        "comment_id": "2194645815",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 147588,
        "pr_file": "llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp",
        "discussion_id": "2193701526",
        "commented_code": "@@ -5368,6 +5369,46 @@ void LSRInstance::NarrowSearchSpaceUsingHeuristics() {\n     NarrowSearchSpaceByPickingWinnerRegs();\n }\n \n+/// Sort LSRUses to address side effects of compile time optimization done in\n+/// SolveRecurse which filters out formulae not including required registers.\n+/// Such optimization makes the found best solution sensitive to the order\n+/// of LSRUses processing, hence it's important to ensure that that order\n+/// isn't random to avoid fluctuations and sub-optimal results.\n+///\n+/// Also check that all LSRUses have formulae as otherwise the situation is\n+/// unsolvable.\n+bool LSRInstance::SortLSRUses() {\n+  SmallVector<LSRUse *, 16> NewOrder;",
        "comment_created_at": "2025-07-09T10:20:29+00:00",
        "comment_author": "SergeyShch01",
        "comment_body": "LSRUse is quite heavy (sizeof(LSRUse)=2184) while intermediate movement of objects can be done during sorting. Then it's faster to sort the array of pointers and establish the new order in the original array afterwards",
        "pr_file_module": null
      }
    ]
  },
  {
    "discussion_id": "2222449730",
    "pr_number": 147824,
    "pr_file": "llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp",
    "created_at": "2025-07-22T12:56:24+00:00",
    "commented_code": "}\n \n Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) {\n+  Type *STy = S->getType();\n   const Loop *L = S->getLoop();\n   BasicBlock *EB = L->getExitBlock();\n   if (!EB || !EB->getSinglePredecessor() ||\n       !SE.DT.dominates(EB, Builder.GetInsertBlock()))\n     return nullptr;\n \n   for (auto &PN : EB->phis()) {\n-    if (!SE.isSCEVable(PN.getType()) || PN.getType() != S->getType())\n+    if (!SE.isSCEVable(PN.getType()))\n       continue;\n-    auto *ExitV = SE.getSCEV(&PN);\n-    if (S == ExitV)\n-      return &PN;\n+    auto *ExitSCEV = SE.getSCEV(&PN);\n+    Type *PhiTy = PN.getType();\n+    if (STy->isIntegerTy() && PhiTy->isPointerTy())\n+      ExitSCEV = SE.getPtrToIntExpr(ExitSCEV, STy);\n+    else if (S->getType() != PN.getType())\n+      continue;\n+\n+    // Check if we can re-use the existing PN, by adjusting it with an expanded\n+    // offset, if the offset is simpler (for now just checks if it is\n+    // AddRec-free).\n+    const SCEV *Diff = SE.getMinusSCEV(S, ExitSCEV);\n+    if (isa<SCEVCouldNotCompute>(Diff) ||\n+        SCEVExprContains(Diff,\n+                         [](const SCEV *S) { return isa<SCEVAddRecExpr>(S); }))\n+      continue;",
    "repo_full_name": "llvm/llvm-project",
    "discussion_comments": [
      {
        "comment_id": "2222449730",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 147824,
        "pr_file": "llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp",
        "discussion_id": "2222449730",
        "commented_code": "@@ -1224,18 +1224,37 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) {\n }\n \n Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) {\n+  Type *STy = S->getType();\n   const Loop *L = S->getLoop();\n   BasicBlock *EB = L->getExitBlock();\n   if (!EB || !EB->getSinglePredecessor() ||\n       !SE.DT.dominates(EB, Builder.GetInsertBlock()))\n     return nullptr;\n \n   for (auto &PN : EB->phis()) {\n-    if (!SE.isSCEVable(PN.getType()) || PN.getType() != S->getType())\n+    if (!SE.isSCEVable(PN.getType()))\n       continue;\n-    auto *ExitV = SE.getSCEV(&PN);\n-    if (S == ExitV)\n-      return &PN;\n+    auto *ExitSCEV = SE.getSCEV(&PN);\n+    Type *PhiTy = PN.getType();\n+    if (STy->isIntegerTy() && PhiTy->isPointerTy())\n+      ExitSCEV = SE.getPtrToIntExpr(ExitSCEV, STy);\n+    else if (S->getType() != PN.getType())\n+      continue;\n+\n+    // Check if we can re-use the existing PN, by adjusting it with an expanded\n+    // offset, if the offset is simpler (for now just checks if it is\n+    // AddRec-free).\n+    const SCEV *Diff = SE.getMinusSCEV(S, ExitSCEV);\n+    if (isa<SCEVCouldNotCompute>(Diff) ||\n+        SCEVExprContains(Diff,\n+                         [](const SCEV *S) { return isa<SCEVAddRecExpr>(S); }))\n+      continue;",
        "comment_created_at": "2025-07-22T12:56:24+00:00",
        "comment_author": "nikic",
        "comment_body": "Can we limit this to just constant offsets (and use computeConstantDifference)? This check excludes one particularly bad case, but other complex expansions may also be non-profitable.",
        "pr_file_module": null
      },
      {
        "comment_id": "2230626699",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 147824,
        "pr_file": "llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp",
        "discussion_id": "2222449730",
        "commented_code": "@@ -1224,18 +1224,37 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) {\n }\n \n Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) {\n+  Type *STy = S->getType();\n   const Loop *L = S->getLoop();\n   BasicBlock *EB = L->getExitBlock();\n   if (!EB || !EB->getSinglePredecessor() ||\n       !SE.DT.dominates(EB, Builder.GetInsertBlock()))\n     return nullptr;\n \n   for (auto &PN : EB->phis()) {\n-    if (!SE.isSCEVable(PN.getType()) || PN.getType() != S->getType())\n+    if (!SE.isSCEVable(PN.getType()))\n       continue;\n-    auto *ExitV = SE.getSCEV(&PN);\n-    if (S == ExitV)\n-      return &PN;\n+    auto *ExitSCEV = SE.getSCEV(&PN);\n+    Type *PhiTy = PN.getType();\n+    if (STy->isIntegerTy() && PhiTy->isPointerTy())\n+      ExitSCEV = SE.getPtrToIntExpr(ExitSCEV, STy);\n+    else if (S->getType() != PN.getType())\n+      continue;\n+\n+    // Check if we can re-use the existing PN, by adjusting it with an expanded\n+    // offset, if the offset is simpler (for now just checks if it is\n+    // AddRec-free).\n+    const SCEV *Diff = SE.getMinusSCEV(S, ExitSCEV);\n+    if (isa<SCEVCouldNotCompute>(Diff) ||\n+        SCEVExprContains(Diff,\n+                         [](const SCEV *S) { return isa<SCEVAddRecExpr>(S); }))\n+      continue;",
        "comment_created_at": "2025-07-25T09:27:40+00:00",
        "comment_author": "fhahn",
        "comment_body": "Agreed that checking just for not containing add-recs might be a bit over-optimistic. Restricting to constant difference on the other hand would mean we miss other profitable cases.\r\n\r\nI added a restricted this now to only allow SCEVConstant/SCEVUnknown values, and PtrToInt/negations of those. This should cover all cases I found for now, on a large test set with vectorization enabled .",
        "pr_file_module": null
      }
    ]
  },
  {
    "discussion_id": "2237543183",
    "pr_number": 151006,
    "pr_file": "llvm/lib/Transforms/Utils/SSAUpdater.cpp",
    "created_at": "2025-07-28T18:39:51+00:00",
    "commented_code": "}\n   } else {\n     bool isFirstPred = true;\n-    for (BasicBlock *PredBB : predecessors(BB)) {\n+\n+    // Sort predecessors to get deterministic PHI operand ordering.\n+    SmallVector<BasicBlock *, 8> SortedPreds(predecessors(BB));",
    "repo_full_name": "llvm/llvm-project",
    "discussion_comments": [
      {
        "comment_id": "2237543183",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 151006,
        "pr_file": "llvm/lib/Transforms/Utils/SSAUpdater.cpp",
        "discussion_id": "2237543183",
        "commented_code": "@@ -122,7 +122,14 @@ Value *SSAUpdater::GetValueInMiddleOfBlock(BasicBlock *BB) {\n     }\n   } else {\n     bool isFirstPred = true;\n-    for (BasicBlock *PredBB : predecessors(BB)) {\n+\n+    // Sort predecessors to get deterministic PHI operand ordering.\n+    SmallVector<BasicBlock *, 8> SortedPreds(predecessors(BB));",
        "comment_created_at": "2025-07-28T18:39:51+00:00",
        "comment_author": "efriedma-quic",
        "comment_body": "predecessors() should be deterministic: it just iterates over the use list of BB.  Is the sort actually necessary?",
        "pr_file_module": null
      },
      {
        "comment_id": "2237565422",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 151006,
        "pr_file": "llvm/lib/Transforms/Utils/SSAUpdater.cpp",
        "discussion_id": "2237543183",
        "commented_code": "@@ -122,7 +122,14 @@ Value *SSAUpdater::GetValueInMiddleOfBlock(BasicBlock *BB) {\n     }\n   } else {\n     bool isFirstPred = true;\n-    for (BasicBlock *PredBB : predecessors(BB)) {\n+\n+    // Sort predecessors to get deterministic PHI operand ordering.\n+    SmallVector<BasicBlock *, 8> SortedPreds(predecessors(BB));",
        "comment_created_at": "2025-07-28T18:50:17+00:00",
        "comment_author": "HackAttack",
        "comment_body": "The problem is not with `predecessors()`, but with the contents of the use list itself. The insertion order into that underlying list is (I guess) what\u2019s nondeterministic.",
        "pr_file_module": null
      },
      {
        "comment_id": "2237579773",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 151006,
        "pr_file": "llvm/lib/Transforms/Utils/SSAUpdater.cpp",
        "discussion_id": "2237543183",
        "commented_code": "@@ -122,7 +122,14 @@ Value *SSAUpdater::GetValueInMiddleOfBlock(BasicBlock *BB) {\n     }\n   } else {\n     bool isFirstPred = true;\n-    for (BasicBlock *PredBB : predecessors(BB)) {\n+\n+    // Sort predecessors to get deterministic PHI operand ordering.\n+    SmallVector<BasicBlock *, 8> SortedPreds(predecessors(BB));",
        "comment_created_at": "2025-07-28T18:55:17+00:00",
        "comment_author": "efriedma-quic",
        "comment_body": "If the order of the use list isn't determinstic, we consider that a bug. (We output the order of that list if you use --preserve-ll-uselistorder.)",
        "pr_file_module": null
      }
    ]
  },
  {
    "discussion_id": "2165117982",
    "pr_number": 145613,
    "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
    "created_at": "2025-06-24T23:15:59+00:00",
    "commented_code": "return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
    "repo_full_name": "llvm/llvm-project",
    "discussion_comments": [
      {
        "comment_id": "2165117982",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-24T23:15:59+00:00",
        "comment_author": "mtrofin",
        "comment_body": "Could you expand a bit more why this (promoting `stack_var` in `outermost_caller`) is bad?",
        "pr_file_module": null
      },
      {
        "comment_id": "2165683351",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T04:19:12+00:00",
        "comment_author": "aemerson",
        "comment_body": "Sure. The problem is that mem2reg promotion has to place phi nodes for the value along the dominance frontier. This frontier is different depending on inlining order. For allocas, what you want is to insert phis when the size of the dominance frontier is as small as possible. The motivation is that allocas inside nested loops can \"leak\" phis beyond the innermost loop header, and that's bad for register pressure.\r\n\r\nThe main inliner already handles this because the pass manager interleaves optimizations with inlining, but for always-inliner we don't have that capability.",
        "pr_file_module": null
      },
      {
        "comment_id": "2167060428",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T15:50:49+00:00",
        "comment_author": "mtrofin",
        "comment_body": "Thanks. We have been experimenting with other traversal orders (hence  `ModuleInliner.cpp`) and this aspect is good to keep in mind. In that context, could the problem addressed here be decoupled from inlining order? It seems like it'd result in a more robust system.\r\n\r\n(I'm not trying to scope-creep, rather want to understand what options we have, and that doesn't have to impact what we do right now)",
        "pr_file_module": null
      },
      {
        "comment_id": "2167269345",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T17:38:04+00:00",
        "comment_author": "aemerson",
        "comment_body": "> In that context, could the problem addressed here be decoupled from inlining order? It seems like it'd result in a more robust system.\r\n\r\nI don't *think* so, unless there's something I've missed. Before doing this I tried other approaches, such as:\r\n  - Trying to detect these over-extended PHIs and then demoting them back to allocas. Didn't work as we end up pessimizing codegen.\r\n  - Avoiding hoisting large vector allocas to the entry block, in order to block mem2reg. This works but is conceptually the wrong place to do it (no other heuristics code exists there).\r\n\r\nI wasn't aware of ModuleInliner. Is the long term plan for it to replace the existing inliner? If so we could in future merge it with AlwaysInliner and if we interleave optimization as the current SCC manager does then this should fix the problem.",
        "pr_file_module": null
      },
      {
        "comment_id": "2167456132",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T19:24:46+00:00",
        "comment_author": "mtrofin",
        "comment_body": "There's no plan yet with the ModuleInliner, currently it lets us experiment with alternative traversals, and some of them have been showing promise.\r\n\r\nI'm mainly trying to understand if:\r\n\r\n- the order of traversal matters (for this problem here)\r\n- do all the function simplification passes need to be run after some inlining or just some? I'm guessing it's really \"just a specific subset\", correct? ",
        "pr_file_module": null
      },
      {
        "comment_id": "2167464978",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T19:30:18+00:00",
        "comment_author": "aemerson",
        "comment_body": "Yes the traversal order matters here, because for optimal codegen we want mem2reg to happen between the inner->middle and middle->outer inlines. If you do it the other way around mem2reg can't do anything until the final inner->outer inline and by that point it's too late.\n\nFor now I think only this promotion is a known issue, I don't know of general issues with simplification.",
        "pr_file_module": null
      },
      {
        "comment_id": "2167528489",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T20:03:26+00:00",
        "comment_author": "mtrofin",
        "comment_body": "Ack, so that means that ModuleInliner running interleaved simplifications won't help (if the order isn't bottom-up traversal).\r\n\r\n(picking on \"Avoiding hoisting large vector allocas to the entry block\") this happens in vectorizable kind of code? Asking (to learn) because in the experiments I mentioned, when we changed traversal order, we also (orthogonally) postponed function simplification to after all inlining was done, with no discernable performance effect, but for datacenter kind of apps, though.\r\n\r\n(also to learn/understand) by delaying promoting the allocas, you're kind of pre-spilling live ranges, right?",
        "pr_file_module": null
      },
      {
        "comment_id": "2167728370",
        "repo_full_name": "llvm/llvm-project",
        "pr_number": 145613,
        "pr_file": "llvm/lib/Transforms/IPO/AlwaysInliner.cpp",
        "discussion_id": "2165117982",
        "commented_code": "@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(\n   return Changed;\n }\n \n+/// Promote allocas to registers if possible.\n+static void promoteAllocas(\n+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,\n+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {\n+  if (AllocasToPromote.empty())\n+    return;\n+\n+  SmallVector<AllocaInst *, 4> PromotableAllocas;\n+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),\n+                isAllocaPromotable);\n+  if (PromotableAllocas.empty())\n+    return;\n+\n+  DominatorTree DT(*Caller);\n+  AssumptionCache &AC = GetAssumptionCache(*Caller);\n+  PromoteMemToReg(PromotableAllocas, DT, &AC);\n+  NumAllocasPromoted += PromotableAllocas.size();\n+  // Emit a remark for the promotion.\n+  OptimizationRemarkEmitter ORE(Caller);\n+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();\n+  ORE.emit([&]() {\n+    return OptimizationRemark(DEBUG_TYPE, \"PromoteAllocas\", DLoc,\n+                              &Caller->getEntryBlock())\n+           << \"Promoting \" << ore::NV(\"NumAlloca\", PromotableAllocas.size())\n+           << \" allocas to SSA registers in function '\"\n+           << ore::NV(\"Function\", Caller) << \"'\";\n+  });\n+  LLVM_DEBUG(dbgs() << \"Promoted \" << PromotableAllocas.size()\n+                    << \" allocas to registers in function \" << Caller->getName()\n+                    << \"\\n\");\n+}\n+\n+/// We use a different visitation order of functions here to solve a phase\n+/// ordering problem. After inlining, a caller function may have allocas that\n+/// were previously used for passing reference arguments to the callee that\n+/// are now promotable to registers, using SROA/mem2reg. However if we just let\n+/// the AlwaysInliner continue inlining everything at once, the later SROA pass\n+/// in the pipeline will end up placing phis for these allocas into blocks along\n+/// the dominance frontier which may extend further than desired (e.g. loop\n+/// headers). This can happen when the caller is then inlined into another\n+/// caller, and the allocas end up hoisted further before SROA is run.\n+///\n+/// Instead what we want is to try to do, as best as we can, is to inline leaf\n+/// functions into callers, and then run PromoteMemToReg() on the allocas that\n+/// were passed into the callee before it was inlined.\n+///\n+/// We want to do this *before* the caller is inlined into another caller\n+/// because we want the alloca promotion to happen before its scope extends too\n+/// far because of further inlining.\n+///\n+/// Here's a simple pseudo-example:\n+/// outermost_caller() {\n+///   for (...) {\n+///     middle_caller();\n+///   }\n+/// }\n+///\n+/// middle_caller() {\n+///   int stack_var;\n+///   inner_callee(&stack_var);\n+/// }\n+///\n+/// inner_callee(int *x) {\n+///   // Do something with x.\n+/// }\n+///\n+/// In this case, we want to inline inner_callee() into middle_caller() and\n+/// then promote stack_var to a register before we inline middle_caller() into\n+/// outermost_caller(). The regular always_inliner would inline everything at\n+/// once, and then SROA/mem2reg would promote stack_var to a register but in\n+/// the context of outermost_caller() which is not what we want.",
        "comment_created_at": "2025-06-25T22:12:27+00:00",
        "comment_author": "aemerson",
        "comment_body": "> (picking on \"Avoiding hoisting large vector allocas to the entry block\") this happens in vectorizable kind of code? Asking (to learn) because in the experiments I mentioned, when we changed traversal order, we also (orthogonally) postponed function simplification to after all inlining was done, with no discernable performance effect, but for datacenter kind of apps, though.\r\n\r\nThe motivating use case happens to be builtins code written with multiple large vectors (like `<16 x float>`) using the Arm SME extension. It's common for programmers to define these large vector values on the stack and then pass references to helper functions. It's these allocas that we're trying to ensure get promoted at the right time. You won't see this in general code like loop-vectorization since that all happens after inlining.\r\n\r\nThat said, this isn't solely restricted to builtins/vectors. We also see some negligible changes (I wouldn't go as far as \"improvements\") in WebKit which is a very heavy user of `always_inline`.\r\n> \r\n> (also to learn/understand) by delaying promoting the allocas, you're kind of pre-spilling live ranges, right?\r\n\r\nNot sure I understand the question. Here we're not really delaying the promotion, but eagerly promoting at the smallest scope possible. In general though yes, if you were to avoid promotion/mem2reg you'd effectively force spilling and hope that other memory optimizations clean things up (unlikely).\r\n\r\n",
        "pr_file_module": null
      }
    ]
  }
]