Add 41 awesome reviewers for PostHog/posthog

This commit is contained in:
internal-baz-ci-app[bot]
2025-08-19 12:19:58 +00:00
committed by GitHub
parent 7bd847ea32
commit d30a402e74
78 changed files with 5738 additions and 0 deletions

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,38 @@
---
title: Add explanatory tooltips
description: When UI elements have unclear functionality or purpose, add tooltips
to provide immediate context and explanation. This is especially important for icons,
checkboxes, and interactive elements where the behavior or consequences aren't immediately
obvious to users.
repository: PostHog/posthog
label: Documentation
language: TSX
comments_count: 3
repository_stars: 28460
---
When UI elements have unclear functionality or purpose, add tooltips to provide immediate context and explanation. This is especially important for icons, checkboxes, and interactive elements where the behavior or consequences aren't immediately obvious to users.
Tooltips should:
- Explain what the element does in clear, user-friendly language
- Describe any non-obvious consequences or behaviors
- Use terminology consistent with the rest of the UI
- Link to documentation when appropriate (though some tooltip implementations only support strings)
Example from the discussions:
```tsx
<LemonCheckbox
checked={!!filter.optionalInFunnel}
onChange={(checked) => {
updateFilterOptional({
...filter,
optionalInFunnel: checked,
index,
})
}}
label="Optional step"
tooltip="When checked, this step won't cause users to drop out of the funnel if they skip it"
/>
```
This practice improves user experience by providing inline documentation that helps users understand functionality without needing to consult external documentation or guess at behavior.

View File

@@ -0,0 +1,70 @@
[
{
"discussion_id": "2284256153",
"pr_number": 36307,
"pr_file": "plugin-server/src/main/ingestion-queues/session-recording-v2/retention/retention-service.ts",
"created_at": "2025-08-19T06:33:51+00:00",
"commented_code": "+import { RedisPool, Team } from '../../../../types'\n+import { TeamId } from '../../../../types'\n+import { BackgroundRefresher } from '../../../../utils/background-refresher'\n+import { PostgresRouter, PostgresUse } from '../../../../utils/db/postgres'\n+import { logger } from '../../../../utils/logger'\n+import { ValidRetentionPeriods } from '../constants'\n+import { MessageWithTeam } from '../teams/types'\n+import { RetentionPeriod } from '../types'\n+import { MessageWithRetention } from './types'\n+\n+function isValidRetentionPeriod(retentionPeriod: string): retentionPeriod is RetentionPeriod {\n+ return ValidRetentionPeriods.includes(retentionPeriod as RetentionPeriod)\n+}\n+\n+export class RetentionService {\n+ private readonly retentionRefresher: BackgroundRefresher<Record<TeamId, RetentionPeriod>>\n+\n+ constructor(\n+ private postgres: PostgresRouter,\n+ private redisPool: RedisPool,\n+ private keyPrefix = '@posthog/replay/'\n+ ) {\n+ this.retentionRefresher = new BackgroundRefresher(\n+ () => this.fetchTeamRetentionPeriods(),\n+ 5 * 60 * 1000, // 5 minutes\n+ (e) => {\n+ // We ignore the error and wait for postgres to recover\n+ logger.error('Error refreshing team retention periods', e)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2284256153",
"repo_full_name": "PostHog/posthog",
"pr_number": 36307,
"pr_file": "plugin-server/src/main/ingestion-queues/session-recording-v2/retention/retention-service.ts",
"discussion_id": "2284256153",
"commented_code": "@@ -0,0 +1,111 @@\n+import { RedisPool, Team } from '../../../../types'\n+import { TeamId } from '../../../../types'\n+import { BackgroundRefresher } from '../../../../utils/background-refresher'\n+import { PostgresRouter, PostgresUse } from '../../../../utils/db/postgres'\n+import { logger } from '../../../../utils/logger'\n+import { ValidRetentionPeriods } from '../constants'\n+import { MessageWithTeam } from '../teams/types'\n+import { RetentionPeriod } from '../types'\n+import { MessageWithRetention } from './types'\n+\n+function isValidRetentionPeriod(retentionPeriod: string): retentionPeriod is RetentionPeriod {\n+ return ValidRetentionPeriods.includes(retentionPeriod as RetentionPeriod)\n+}\n+\n+export class RetentionService {\n+ private readonly retentionRefresher: BackgroundRefresher<Record<TeamId, RetentionPeriod>>\n+\n+ constructor(\n+ private postgres: PostgresRouter,\n+ private redisPool: RedisPool,\n+ private keyPrefix = '@posthog/replay/'\n+ ) {\n+ this.retentionRefresher = new BackgroundRefresher(\n+ () => this.fetchTeamRetentionPeriods(),\n+ 5 * 60 * 1000, // 5 minutes\n+ (e) => {\n+ // We ignore the error and wait for postgres to recover\n+ logger.error('Error refreshing team retention periods', e)",
"comment_created_at": "2025-08-19T06:33:51+00:00",
"comment_author": "pauldambra",
"comment_body": "we could have a prom counter here \ud83e\udd14 trying to picture what grafana dashboard I'd want to make to know if this all was working when deployed (and what we might want to alert on...)",
"pr_file_module": null
},
{
"comment_id": "2284549572",
"repo_full_name": "PostHog/posthog",
"pr_number": 36307,
"pr_file": "plugin-server/src/main/ingestion-queues/session-recording-v2/retention/retention-service.ts",
"discussion_id": "2284256153",
"commented_code": "@@ -0,0 +1,111 @@\n+import { RedisPool, Team } from '../../../../types'\n+import { TeamId } from '../../../../types'\n+import { BackgroundRefresher } from '../../../../utils/background-refresher'\n+import { PostgresRouter, PostgresUse } from '../../../../utils/db/postgres'\n+import { logger } from '../../../../utils/logger'\n+import { ValidRetentionPeriods } from '../constants'\n+import { MessageWithTeam } from '../teams/types'\n+import { RetentionPeriod } from '../types'\n+import { MessageWithRetention } from './types'\n+\n+function isValidRetentionPeriod(retentionPeriod: string): retentionPeriod is RetentionPeriod {\n+ return ValidRetentionPeriods.includes(retentionPeriod as RetentionPeriod)\n+}\n+\n+export class RetentionService {\n+ private readonly retentionRefresher: BackgroundRefresher<Record<TeamId, RetentionPeriod>>\n+\n+ constructor(\n+ private postgres: PostgresRouter,\n+ private redisPool: RedisPool,\n+ private keyPrefix = '@posthog/replay/'\n+ ) {\n+ this.retentionRefresher = new BackgroundRefresher(\n+ () => this.fetchTeamRetentionPeriods(),\n+ 5 * 60 * 1000, // 5 minutes\n+ (e) => {\n+ // We ignore the error and wait for postgres to recover\n+ logger.error('Error refreshing team retention periods', e)",
"comment_created_at": "2025-08-19T08:36:45+00:00",
"comment_author": "TueHaulund",
"comment_body": "This code is somewhat plagiarized from the teams service: https://github.com/PostHog/posthog/blob/master/plugin-server/src/main/ingestion-queues/session-recording-v2/teams/team-service.ts#L12\r\n\r\nGiven that it's quite battle tested I consider this part of the PR quite low risk. That being said a counter here (and also in the teams service) would make sense so it's not just failing silently for a long time, eventually we would have a very stale in-memory copy that would start to cause issues.",
"pr_file_module": null
},
{
"comment_id": "2284552442",
"repo_full_name": "PostHog/posthog",
"pr_number": 36307,
"pr_file": "plugin-server/src/main/ingestion-queues/session-recording-v2/retention/retention-service.ts",
"discussion_id": "2284256153",
"commented_code": "@@ -0,0 +1,111 @@\n+import { RedisPool, Team } from '../../../../types'\n+import { TeamId } from '../../../../types'\n+import { BackgroundRefresher } from '../../../../utils/background-refresher'\n+import { PostgresRouter, PostgresUse } from '../../../../utils/db/postgres'\n+import { logger } from '../../../../utils/logger'\n+import { ValidRetentionPeriods } from '../constants'\n+import { MessageWithTeam } from '../teams/types'\n+import { RetentionPeriod } from '../types'\n+import { MessageWithRetention } from './types'\n+\n+function isValidRetentionPeriod(retentionPeriod: string): retentionPeriod is RetentionPeriod {\n+ return ValidRetentionPeriods.includes(retentionPeriod as RetentionPeriod)\n+}\n+\n+export class RetentionService {\n+ private readonly retentionRefresher: BackgroundRefresher<Record<TeamId, RetentionPeriod>>\n+\n+ constructor(\n+ private postgres: PostgresRouter,\n+ private redisPool: RedisPool,\n+ private keyPrefix = '@posthog/replay/'\n+ ) {\n+ this.retentionRefresher = new BackgroundRefresher(\n+ () => this.fetchTeamRetentionPeriods(),\n+ 5 * 60 * 1000, // 5 minutes\n+ (e) => {\n+ // We ignore the error and wait for postgres to recover\n+ logger.error('Error refreshing team retention periods', e)",
"comment_created_at": "2025-08-19T08:37:51+00:00",
"comment_author": "TueHaulund",
"comment_body": "Will add an error counter \ud83d\udc4d ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2259808025",
"pr_number": 36291,
"pr_file": "plugin-server/src/worker/ingestion/event-pipeline/runner.ts",
"created_at": "2025-08-07T10:11:07+00:00",
"commented_code": "if (!key) {\n return false // for safety don't drop events here, they are later dropped in teamDataPopulation\n }\n+\n+ if (event.event === '$exception') {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2259808025",
"repo_full_name": "PostHog/posthog",
"pr_number": 36291,
"pr_file": "plugin-server/src/worker/ingestion/event-pipeline/runner.ts",
"discussion_id": "2259808025",
"commented_code": "@@ -85,6 +84,13 @@ export class EventPipelineRunner {\n if (!key) {\n return false // for safety don't drop events here, they are later dropped in teamDataPopulation\n }\n+\n+ if (event.event === '$exception') {",
"comment_created_at": "2025-08-07T10:11:07+00:00",
"comment_author": "oliverb123",
"comment_body": "Might be nice to add a metric here we can have an alert on?",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,31 @@
---
title: Add monitoring metrics
description: Critical code paths, especially error handling and exception scenarios,
should include metrics or counters to enable monitoring and alerting. This prevents
systems from failing silently and provides visibility into system health.
repository: PostHog/posthog
label: Observability
language: TypeScript
comments_count: 2
repository_stars: 28460
---
Critical code paths, especially error handling and exception scenarios, should include metrics or counters to enable monitoring and alerting. This prevents systems from failing silently and provides visibility into system health.
When implementing error handling, background processes, or exception catching, add appropriate metrics that can be used for dashboards and alerts. This is particularly important for services that could degrade gracefully but cause issues over time if problems go unnoticed.
Example:
```typescript
// Before: Silent error handling
(e) => {
logger.error('Error refreshing team retention periods', e)
}
// After: Add metrics for monitoring
(e) => {
logger.error('Error refreshing team retention periods', e)
statsd.increment('retention_service.refresh_error')
}
```
Consider what dashboards and alerts you would want to create when the code is deployed. If you can't easily monitor whether a system is working correctly, add the necessary instrumentation.

View File

@@ -0,0 +1,246 @@
[
{
"discussion_id": "2262792462",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"created_at": "2025-08-08T12:09:10+00:00",
"commented_code": "**Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2262792462",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T12:09:10+00:00",
"comment_author": "marandaneto",
"comment_body": "How will Max AI get the `linked_flag_id` from a `new-dashboard` feature flag key? What else would you need to do to make this work?",
"pr_file_module": null
},
{
"comment_id": "2262833348",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T12:23:05+00:00",
"comment_author": "marandaneto",
"comment_body": "@PostHog/team-max-ai ping since i am not sure how to proceed here, thanks.",
"pr_file_module": null
},
{
"comment_id": "2262857045",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T12:34:28+00:00",
"comment_author": "marandaneto",
"comment_body": "Another option would be to allow setting `linked_flag_key` and have our API fetch the id by key, or would Max do that for us?",
"pr_file_module": null
},
{
"comment_id": "2262916666",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T13:02:25+00:00",
"comment_author": "denakorita",
"comment_body": "Hey @marandaneto, you can use `bind_tools` in order to attach tools to your model and specify what it should use that tool for. In this case your tool would be sth like `retrieve_flag_id(feature_key)` (here the API call to fetch the key)\r\nThe llm basically should be taught to identify the `feature_key` in the user input and then use that to call the tool I mentioned above. \r\nIn order for the llm to identify the `feature_key` you need to maybe provide the entire list of feature keys we have in the prompt (i guess it should not be too long) or have another tool `retrieve_feature_keys` and have the llm do that first then in a second step use the feature key to retrieve its ID. \r\n",
"pr_file_module": null
},
{
"comment_id": "2262960139",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T13:19:26+00:00",
"comment_author": "marandaneto",
"comment_body": "thanks @denakorita \r\nlemme check if i find some examples about the `bind_tools`",
"pr_file_module": null
},
{
"comment_id": "2262970515",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-08T13:23:56+00:00",
"comment_author": "denakorita",
"comment_body": "@marandaneto np, one example that comes to mind is `ee/hogai/graph/taxonomy/nodes.py`, check the prompts there too, yours should be much simpler (that prompt is huge, you maybe dont even need to add tool instructions in the prompt, the tool description should suffice). Let me know if you face any issue, would be happy to help. Ping me via slack. ",
"pr_file_module": null
},
{
"comment_id": "2264933795",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-09T19:17:30+00:00",
"comment_author": "lucasheriques",
"comment_body": "@denakorita @marandaneto I took a stab at this problem, and though that, instead of using a LLM call, we could using string matching instead to make it faster and reduce cost. [Here's the commit](https://github.com/PostHog/posthog/pull/36420/commits/45c4aff586d7642c475e85bae0c241521a066252), what do you think?\r\n\r\nAnother alternative is probably get all feature flags on request, and create a dictionary with `ff_key: ff_id` and add that on the prompt.\r\n\r\nI tried adding a tool to fetch that, but at least for me I couldn't make it work reliably",
"pr_file_module": null
},
{
"comment_id": "2266098263",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-11T09:09:49+00:00",
"comment_author": "marandaneto",
"comment_body": "> We went looking everywhere, but couldn\u2019t find those commits.\r\n\r\ninvalid commit link",
"pr_file_module": null
},
{
"comment_id": "2266101279",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-11T09:11:12+00:00",
"comment_author": "marandaneto",
"comment_body": "> instead to make it faster and reduce cost\r\n\r\nnot sure about the changes if they make sense or not (since commit link is broken), but i'd get it working first and then optimize for the cost if thats ever an issue, assuming the LLM call is the way how all the other Max AI tools do",
"pr_file_module": null
},
{
"comment_id": "2266112175",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-11T09:16:15+00:00",
"comment_author": "denakorita",
"comment_body": "@lucasheriques hey lucas, if the list of the feature flags is not very long for the organisations then I would recommend that you build the mapping and inject it to the prompt like u mentioned and avoid tool calling. ",
"pr_file_module": null
},
{
"comment_id": "2266122057",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-11T09:20:27+00:00",
"comment_author": "marandaneto",
"comment_body": "> if the list of feature flags is not very long for the organisations\r\n\r\nWe are trading a possible LLM call vs a PostHog API call (always).\r\nThe LLM will have to process the response of the API call even if it is not needed (the prompt does not mention flags but we did give a list of flags).\r\nWe never know how many flags the org will have, can be N or Y.\r\n\r\nI don't know if it pays off.",
"pr_file_module": null
},
{
"comment_id": "2267117312",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/prompts.py",
"discussion_id": "2262792462",
"commented_code": "@@ -106,6 +112,10 @@\n **Simple NPS**: \"Create an NPS survey\"\n **Targeted Feedback**: \"Get feedback on the dashboard from mobile users\"\n **Complex Research**: \"Survey users about our pricing page experience\"\n+**Feature Flag Targeting**: \"Survey users who have the new-dashboard feature flag enabled\"\n+**Feature Experiment Feedback**: \"Get feedback from beta users in the notebooks-redesign experiment\"",
"comment_created_at": "2025-08-11T15:22:26+00:00",
"comment_author": "lucasheriques",
"comment_body": "here's the proper commit, sorry, [wrong link](https://github.com/PostHog/posthog/pull/36420/commits/dd0741118e1d1e7dd791b94a8d5214e2c37bfc36)",
"pr_file_module": null
}
]
},
{
"discussion_id": "2245822928",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/root/prompts.py",
"created_at": "2025-07-31T16:13:04+00:00",
"commented_code": "The tool `create_and_query_insight` generates an arbitrary new query (aka insight) based on the provided parameters, executes the query, and returns the formatted results.\n The tool only retrieves a single query per call. If the user asks for multiple insights, you need to decompose a query into multiple subqueries and call the tool for each subquery.\n \n+CRITICAL ROUTING LOGIC:\n+- On the FIRST request for insights: Perform a search for existing insights first (using `search_insights` tool), then decide whether to use existing ones or create new ones. (remember to set state.root_tool_insight_plan to the user's query)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2245822928",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/root/prompts.py",
"discussion_id": "2245822928",
"commented_code": "@@ -58,6 +58,11 @@\n The tool `create_and_query_insight` generates an arbitrary new query (aka insight) based on the provided parameters, executes the query, and returns the formatted results.\n The tool only retrieves a single query per call. If the user asks for multiple insights, you need to decompose a query into multiple subqueries and call the tool for each subquery.\n \n+CRITICAL ROUTING LOGIC:\n+- On the FIRST request for insights: Perform a search for existing insights first (using `search_insights` tool), then decide whether to use existing ones or create new ones. (remember to set state.root_tool_insight_plan to the user's query)",
"comment_created_at": "2025-07-31T16:13:04+00:00",
"comment_author": "kappa90",
"comment_body": "the LLM doesn't know what `state.root_tool_insight_plan` is, it can only do tool calls specifying an arg (`query_description` in the case of `create_and_query_insight`)",
"pr_file_module": null
},
{
"comment_id": "2246003978",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/root/prompts.py",
"discussion_id": "2245822928",
"commented_code": "@@ -58,6 +58,11 @@\n The tool `create_and_query_insight` generates an arbitrary new query (aka insight) based on the provided parameters, executes the query, and returns the formatted results.\n The tool only retrieves a single query per call. If the user asks for multiple insights, you need to decompose a query into multiple subqueries and call the tool for each subquery.\n \n+CRITICAL ROUTING LOGIC:\n+- On the FIRST request for insights: Perform a search for existing insights first (using `search_insights` tool), then decide whether to use existing ones or create new ones. (remember to set state.root_tool_insight_plan to the user's query)",
"comment_created_at": "2025-07-31T17:37:53+00:00",
"comment_author": "tatoalo",
"comment_body": "Yes @kappa90, totally right here as well! This was just a test that I performed but forgot to remove, it indeed had no effect at all :) ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2251887833",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"created_at": "2025-08-04T15:51:01+00:00",
"commented_code": "root_tool_call_id=None,\n )\n \n- def router(self, state: AssistantState) -> Literal[\"end\", \"root\"]:\n+ def _evaluate_insights_for_creation(self, selected_insights: list[int], user_query: str) -> dict:",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2251887833",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2251887833",
"commented_code": "@@ -313,7 +528,149 @@ def _create_error_response(self, content: str, tool_call_id: str | None) -> Part\n root_tool_call_id=None,\n )\n \n- def router(self, state: AssistantState) -> Literal[\"end\", \"root\"]:\n+ def _evaluate_insights_for_creation(self, selected_insights: list[int], user_query: str) -> dict:",
"comment_created_at": "2025-08-04T15:51:01+00:00",
"comment_author": "kappa90",
"comment_body": "nit: this tells you if the set of insights is useful, rather than telling which specific ones should be kept; if you moved to tool calls, you could easily ask the LLM to select the right ones directly, with explanation and everything.",
"pr_file_module": null
},
{
"comment_id": "2256958546",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2251887833",
"commented_code": "@@ -313,7 +528,149 @@ def _create_error_response(self, content: str, tool_call_id: str | None) -> Part\n root_tool_call_id=None,\n )\n \n- def router(self, state: AssistantState) -> Literal[\"end\", \"root\"]:\n+ def _evaluate_insights_for_creation(self, selected_insights: list[int], user_query: str) -> dict:",
"comment_created_at": "2025-08-06T12:14:48+00:00",
"comment_author": "tatoalo",
"comment_body": "Great point @kappa90! Refactoring to using more tool calls!",
"pr_file_module": null
}
]
},
{
"discussion_id": "2267375367",
"pr_number": 36420,
"pr_file": "products/surveys/backend/max_tools.py",
"created_at": "2025-08-11T16:32:45+00:00",
"commented_code": "async def _create_survey_from_instructions(self, instructions: str) -> SurveyCreationSchema:\n \"\"\"\n- Create a survey from natural language instructions.\n+ Create a survey from natural language instructions using PostHog-native pattern.\n \"\"\"\n- # Create the prompt\n- prompt = ChatPromptTemplate.from_messages(\n- [\n- (\"system\", SURVEY_CREATION_SYSTEM_PROMPT),\n- (\"human\", \"Create a survey based on these instructions: {{{instructions}}}\"),\n- ],\n- template_format=\"mustache\",\n- )\n+ logger.info(f\"Starting survey creation with instructions: '{instructions}'\")\n+\n+ # Extract and lookup feature flags inline (following PostHog pattern)\n+ feature_flag_context = await self._extract_feature_flags_inline(instructions)\n+\n+ # Build enhanced system prompt with feature flag information\n+ enhanced_system_prompt = SURVEY_CREATION_SYSTEM_PROMPT\n+ if feature_flag_context:\n+ enhanced_system_prompt += f\"\n\n## Available Feature Flags\n{feature_flag_context}\"\n+\n+ # Single LLM call with all context (cost-effective, fast)\n+ prompt = ChatPromptTemplate.from_messages([\n+ (\"system\", enhanced_system_prompt),\n+ (\"human\", \"Create a survey based on these instructions: {{{instructions}}}\")\n+ ], template_format=\"mustache\")\n \n- # Set up the LLM with structured output\n model = (\n- ChatOpenAI(model=\"gpt-4.1-mini\", temperature=0.2)\n+ ChatOpenAI(model=\"gpt-5-mini\")",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2267375367",
"repo_full_name": "PostHog/posthog",
"pr_number": 36420,
"pr_file": "products/surveys/backend/max_tools.py",
"discussion_id": "2267375367",
"commented_code": "@@ -42,34 +45,38 @@ class CreateSurveyTool(MaxTool):\n \n async def _create_survey_from_instructions(self, instructions: str) -> SurveyCreationSchema:\n \"\"\"\n- Create a survey from natural language instructions.\n+ Create a survey from natural language instructions using PostHog-native pattern.\n \"\"\"\n- # Create the prompt\n- prompt = ChatPromptTemplate.from_messages(\n- [\n- (\"system\", SURVEY_CREATION_SYSTEM_PROMPT),\n- (\"human\", \"Create a survey based on these instructions: {{{instructions}}}\"),\n- ],\n- template_format=\"mustache\",\n- )\n+ logger.info(f\"Starting survey creation with instructions: '{instructions}'\")\n+\n+ # Extract and lookup feature flags inline (following PostHog pattern)\n+ feature_flag_context = await self._extract_feature_flags_inline(instructions)\n+\n+ # Build enhanced system prompt with feature flag information\n+ enhanced_system_prompt = SURVEY_CREATION_SYSTEM_PROMPT\n+ if feature_flag_context:\n+ enhanced_system_prompt += f\"\\n\\n## Available Feature Flags\\n{feature_flag_context}\"\n+\n+ # Single LLM call with all context (cost-effective, fast)\n+ prompt = ChatPromptTemplate.from_messages([\n+ (\"system\", enhanced_system_prompt),\n+ (\"human\", \"Create a survey based on these instructions: {{{instructions}}}\")\n+ ], template_format=\"mustache\")\n \n- # Set up the LLM with structured output\n model = (\n- ChatOpenAI(model=\"gpt-4.1-mini\", temperature=0.2)\n+ ChatOpenAI(model=\"gpt-5-mini\")",
"comment_created_at": "2025-08-11T16:32:45+00:00",
"comment_author": "denakorita",
"comment_body": "How was the latency here? Also, did u have to upgrade `langchain` to use gpt5? \r\nMaybe also setting the `reasoning={\"effort\": \"low\"},` here might be best. ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,40 @@
---
title: AI context efficiency
description: When providing context to LLMs, choose the most efficient method based
on the nature and size of the context data. For bounded, static context (like feature
flag mappings or configuration options), inject the information directly into the
system prompt rather than using tool calls. This approach is faster, more cost-effective,
and reduces complexity.
repository: PostHog/posthog
label: AI
language: Python
comments_count: 4
repository_stars: 28460
---
When providing context to LLMs, choose the most efficient method based on the nature and size of the context data. For bounded, static context (like feature flag mappings or configuration options), inject the information directly into the system prompt rather than using tool calls. This approach is faster, more cost-effective, and reduces complexity.
**Prefer prompt injection when:**
- Context data is relatively small and bounded
- Data doesn't change frequently during the conversation
- You want to minimize API calls and latency
**Use tool calls when:**
- Context data is large or unbounded
- Data needs to be fetched dynamically based on user input
- You need the LLM to make decisions about what context to retrieve
**Example:**
```python
# Good: Inject bounded feature flag context into prompt
enhanced_system_prompt = SURVEY_CREATION_SYSTEM_PROMPT
if feature_flag_context:
enhanced_system_prompt += f"\n\n## Available Feature Flags\n{feature_flag_context}"
# Avoid: Using tool calls for static, bounded context
# This adds unnecessary complexity and cost
def retrieve_flag_id(feature_key): # Tool call - overkill for small static data
return api_call_to_get_flag_id(feature_key)
```
Also ensure your prompts only reference capabilities the LLM actually has - avoid instructing LLMs to manipulate internal state variables they cannot access.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,37 @@
---
title: API initialization side effects
description: When initializing API clients, prefer bootstrap/configuration patterns
over method calls that may trigger unintended side effects like billing events,
data capture, or state changes. Method calls during initialization can have unexpected
consequences that users may not anticipate or want to pay for.
repository: PostHog/posthog
label: API
language: Html
comments_count: 2
repository_stars: 28460
---
When initializing API clients, prefer bootstrap/configuration patterns over method calls that may trigger unintended side effects like billing events, data capture, or state changes. Method calls during initialization can have unexpected consequences that users may not anticipate or want to pay for.
Instead of calling methods like `identify()`, `capture()`, or similar action-triggering functions during client setup, use configuration objects, bootstrap data, or initialization parameters to achieve the same result without side effects.
Example of problematic initialization:
```javascript
// This triggers an identify event that users get billed for
posthog.init(token, config);
posthog.identify(distinctId); // Captures billable event
```
Preferred approach using bootstrap configuration:
```javascript
// This achieves the same result without capturing events
const config = {
api_host: projectConfig.api_host,
bootstrap: {
distinctId: distinctId // Set identity without triggering events
}
};
posthog.init(token, config);
```
This pattern ensures that client initialization only sets up the necessary state without triggering actions that have business implications or costs. Always consider whether initialization methods have side effects and prefer declarative configuration approaches when available.

View File

@@ -0,0 +1,82 @@
[
{
"discussion_id": "2262796731",
"pr_number": 36374,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"created_at": "2025-08-08T12:10:09+00:00",
"commented_code": "null\n )\n // Reset variant selection when flag changes\n+ const {\n+ linkedFlagVariant,\n+ ...conditions\n+ } = survey.conditions\n setSurveyValue('conditions', {\n- ...survey.conditions,\n- linkedFlagVariant: null,\n+ ...conditions,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2262796731",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"discussion_id": "2262796731",
"commented_code": "@@ -783,17 +783,23 @@ export default function SurveyEdit(): JSX.Element {\n null\n )\n // Reset variant selection when flag changes\n+ const {\n+ linkedFlagVariant,\n+ ...conditions\n+ } = survey.conditions\n setSurveyValue('conditions', {\n- ...survey.conditions,\n- linkedFlagVariant: null,\n+ ...conditions,",
"comment_created_at": "2025-08-08T12:10:09+00:00",
"comment_author": "marandaneto",
"comment_body": "we were calling the API with `{\r\n \"url\": \"\",\r\n \"linkedFlagVariant\": null\r\n}` instead of no `linkedFlagVariant` at all.",
"pr_file_module": null
},
{
"comment_id": "2262837694",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"discussion_id": "2262796731",
"commented_code": "@@ -783,17 +783,23 @@ export default function SurveyEdit(): JSX.Element {\n null\n )\n // Reset variant selection when flag changes\n+ const {\n+ linkedFlagVariant,\n+ ...conditions\n+ } = survey.conditions\n setSurveyValue('conditions', {\n- ...survey.conditions,\n- linkedFlagVariant: null,\n+ ...conditions,",
"comment_created_at": "2025-08-08T12:25:19+00:00",
"comment_author": "marandaneto",
"comment_body": "also fixed the `\"url\": \"\"` issue, `url` is also `Optional`",
"pr_file_module": null
}
]
},
{
"discussion_id": "2250144726",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"created_at": "2025-08-04T13:49:14+00:00",
"commented_code": "export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}\n+ <code className=\"bg-surface-tertiary px-1 rounded\">?distinct_id=user@email.com</code> to\n+ the URL. Otherwise responses are anonymous.",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2251560706",
"repo_full_name": "PostHog/posthog",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"discussion_id": "2250144726",
"commented_code": "@@ -54,13 +52,28 @@ const QuestionIconMap = {\n export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}\n+ <code className=\"bg-surface-tertiary px-1 rounded\">?distinct_id=user@email.com</code> to\n+ the URL. Otherwise responses are anonymous.",
"comment_created_at": "2025-08-04T13:49:14+00:00",
"comment_author": "lucasheriques",
"comment_body": "i used email because I think they're easier to reason about. @marandaneto you agreed with greptile here, but why do you think we should do that? \r\n\r\nMost people collect emails on their apps, and I think using it is more friendly to non-technical people than using any other kind of ID\r\n\r\nThis way, even if the responder is not a person in your app (so they don't have the person profile in PostHog), you can still identify who answered. Technical people and others that more familiar with PostHog might know better that the distinct_id can also have other formats. For the example, I prefer to optimize for non-technical users. WDYT?",
"pr_file_module": null
},
{
"comment_id": "2251575200",
"repo_full_name": "PostHog/posthog",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"discussion_id": "2250144726",
"commented_code": "@@ -54,13 +52,28 @@ const QuestionIconMap = {\n export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}\n+ <code className=\"bg-surface-tertiary px-1 rounded\">?distinct_id=user@email.com</code> to\n+ the URL. Otherwise responses are anonymous.",
"comment_created_at": "2025-08-04T13:54:53+00:00",
"comment_author": "marandaneto",
"comment_body": "if you want to support email identification, you should allow the API to also receive an `email` query param and not `distinct_id` since this is a known and unique posthog identifier.\r\nright now passing a `distinct_id` and an email value is semantically wrong (unless people do that when calling `identify(distinctId=myEmail)` and will duplicate users and skew active user metrics etc\r\nthats why i prefer to do proper documentation about this and just link to the docs instead of duplicating all this info in the UI across multiple places/hints, etc",
"pr_file_module": null
},
{
"comment_id": "2251586686",
"repo_full_name": "PostHog/posthog",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"discussion_id": "2250144726",
"commented_code": "@@ -54,13 +52,28 @@ const QuestionIconMap = {\n export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}\n+ <code className=\"bg-surface-tertiary px-1 rounded\">?distinct_id=user@email.com</code> to\n+ the URL. Otherwise responses are anonymous.",
"comment_created_at": "2025-08-04T13:58:04+00:00",
"comment_author": "lucasheriques",
"comment_body": "got it, makes sense. I'll address your comments to link to the docs. before I'll will fix the comments in the docs PR now, and link it here once it's merged",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,35 @@
---
title: API parameter semantics
description: Ensure API parameters have clear semantic meaning and avoid sending null
values for optional fields. When designing API endpoints, use parameter names that
accurately reflect their purpose and data type. For optional fields, omit them entirely
from the payload rather than setting them to null, as this creates cleaner API contracts
and prevents confusion...
repository: PostHog/posthog
label: API
language: TSX
comments_count: 2
repository_stars: 28460
---
Ensure API parameters have clear semantic meaning and avoid sending null values for optional fields. When designing API endpoints, use parameter names that accurately reflect their purpose and data type. For optional fields, omit them entirely from the payload rather than setting them to null, as this creates cleaner API contracts and prevents confusion about field requirements.
Example of what to avoid:
```javascript
// Bad: sending null values for optional fields
setSurveyValue('conditions', {
...survey.conditions,
linkedFlagVariant: null, // Don't send null for optional fields
});
```
Example of proper approach:
```javascript
// Good: omit optional fields entirely
const { linkedFlagVariant, ...conditions } = survey.conditions;
setSurveyValue('conditions', {
...conditions, // Only include non-null values
});
```
Additionally, ensure parameter names match their expected data types. For example, use `email` parameter when expecting email addresses rather than overloading `distinct_id` with email values, as this maintains semantic clarity and prevents data attribution issues.

View File

@@ -0,0 +1,126 @@
[
{
"discussion_id": "2280238487",
"pr_number": 36692,
"pr_file": "frontend/src/lib/api.ts",
"created_at": "2025-08-16T04:55:42+00:00",
"commented_code": "}> {\n return await new ApiRequest().dataWarehouse().withAction('total_rows_stats').get(options)\n },\n+\n+ async recentActivity(options?: ApiMethodOptions & { limit?: number }): Promise<{\n+ activities: Array<{\n+ id: string\n+ type: 'external_data_sync' | 'materialized_view'",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2280238487",
"repo_full_name": "PostHog/posthog",
"pr_number": 36692,
"pr_file": "frontend/src/lib/api.ts",
"discussion_id": "2280238487",
"commented_code": "@@ -3356,6 +3356,26 @@ const api = {\n }> {\n return await new ApiRequest().dataWarehouse().withAction('total_rows_stats').get(options)\n },\n+\n+ async recentActivity(options?: ApiMethodOptions & { limit?: number }): Promise<{\n+ activities: Array<{\n+ id: string\n+ type: 'external_data_sync' | 'materialized_view'",
"comment_created_at": "2025-08-16T04:55:42+00:00",
"comment_author": "naumaanh",
"comment_body": "this needs to be fixed. its not external_data_sync, its going to be the name of the sync, ex Stripe. probably best to set this as a String",
"pr_file_module": null
}
]
},
{
"discussion_id": "2283820469",
"pr_number": 36692,
"pr_file": "frontend/src/lib/api.ts",
"created_at": "2025-08-19T01:26:12+00:00",
"commented_code": "dataWarehouse: {\n async total_rows_stats(options?: ApiMethodOptions): Promise<{\n- billingAvailable: boolean\n- billingInterval: string\n- billingPeriodEnd: string\n- billingPeriodStart: string\n- materializedRowsInBillingPeriod: number\n- totalRows: number\n- trackedBillingRows: number\n- pendingBillingRows: number\n+ billing_available: boolean\n+ billing_interval: string\n+ billing_period_end: string\n+ billing_period_start: string\n+ materialized_rows_in_billing_period: number\n+ total_rows: number\n+ tracked_billing_rows: number\n+ pending_billing_rows: number\n }> {\n return await new ApiRequest().dataWarehouse().withAction('total_rows_stats').get(options)\n },\n+\n+ async recentActivity(options?: ApiMethodOptions & { limit?: number; offset?: number }): Promise<{\n+ results: Array<{",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2283820469",
"repo_full_name": "PostHog/posthog",
"pr_number": 36692,
"pr_file": "frontend/src/lib/api.ts",
"discussion_id": "2283820469",
"commented_code": "@@ -3348,17 +3348,49 @@ const api = {\n \n dataWarehouse: {\n async total_rows_stats(options?: ApiMethodOptions): Promise<{\n- billingAvailable: boolean\n- billingInterval: string\n- billingPeriodEnd: string\n- billingPeriodStart: string\n- materializedRowsInBillingPeriod: number\n- totalRows: number\n- trackedBillingRows: number\n- pendingBillingRows: number\n+ billing_available: boolean\n+ billing_interval: string\n+ billing_period_end: string\n+ billing_period_start: string\n+ materialized_rows_in_billing_period: number\n+ total_rows: number\n+ tracked_billing_rows: number\n+ pending_billing_rows: number\n }> {\n return await new ApiRequest().dataWarehouse().withAction('total_rows_stats').get(options)\n },\n+\n+ async recentActivity(options?: ApiMethodOptions & { limit?: number; offset?: number }): Promise<{\n+ results: Array<{",
"comment_created_at": "2025-08-19T01:26:12+00:00",
"comment_author": "EDsCODE",
"comment_body": "There's a `PaginatedResponse` type that is pretty standardized for all paginatable APIs. Should change this to match that. Also add a type for the result objects themselves",
"pr_file_module": null
}
]
},
{
"discussion_id": "2275714281",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/hogflows/actions/hog_function.ts",
"created_at": "2025-08-14T07:23:12+00:00",
"commented_code": "} from '../../../types'\n import { HogExecutorService } from '../../hog-executor.service'\n import { HogFunctionTemplateManagerService } from '../../managers/hog-function-template-manager.service'\n+import { RecipientPreferencesService } from '../../messaging/recipient-preferences.service'\n import { findContinueAction } from '../hogflow-utils'\n import { ActionHandler, ActionHandlerResult } from './action.interface'\n \n export class HogFunctionHandler implements ActionHandler {\n constructor(\n private hub: Hub,\n private hogFunctionExecutor: HogExecutorService,\n- private hogFunctionTemplateManager: HogFunctionTemplateManagerService\n+ private hogFunctionTemplateManager: HogFunctionTemplateManagerService,\n+ private recipientPreferencesService: RecipientPreferencesService\n ) {}\n \n async execute(\n invocation: CyclotronJobInvocationHogFlow,\n- action: Extract<HogFlowAction, { type: 'function' }>,\n+ action: Extract<\n+ HogFlowAction,\n+ { type: 'function' | 'function_email' | 'function_sms' | 'function_slack' | 'function_webhook' }\n+ >,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2275714281",
"repo_full_name": "PostHog/posthog",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/hogflows/actions/hog_function.ts",
"discussion_id": "2275714281",
"commented_code": "@@ -12,19 +12,24 @@ import {\n } from '../../../types'\n import { HogExecutorService } from '../../hog-executor.service'\n import { HogFunctionTemplateManagerService } from '../../managers/hog-function-template-manager.service'\n+import { RecipientPreferencesService } from '../../messaging/recipient-preferences.service'\n import { findContinueAction } from '../hogflow-utils'\n import { ActionHandler, ActionHandlerResult } from './action.interface'\n \n export class HogFunctionHandler implements ActionHandler {\n constructor(\n private hub: Hub,\n private hogFunctionExecutor: HogExecutorService,\n- private hogFunctionTemplateManager: HogFunctionTemplateManagerService\n+ private hogFunctionTemplateManager: HogFunctionTemplateManagerService,\n+ private recipientPreferencesService: RecipientPreferencesService\n ) {}\n \n async execute(\n invocation: CyclotronJobInvocationHogFlow,\n- action: Extract<HogFlowAction, { type: 'function' }>,\n+ action: Extract<\n+ HogFlowAction,\n+ { type: 'function' | 'function_email' | 'function_sms' | 'function_slack' | 'function_webhook' }\n+ >,",
"comment_created_at": "2025-08-14T07:23:12+00:00",
"comment_author": "meikelmosby",
"comment_body": "since we are re-using this maybe something like \n\n```\ntype FunctionActionType =\n | 'function'\n | 'function_email'\n | 'function_sms'\n | 'function_slack'\n | 'function_webhook';\n\ntype Action = Extract<HogFlowAction, { type: FunctionActionType }>;\n```\nbit easier to parse and we can re-use it.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2260189959",
"pr_number": 36002,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"created_at": "2025-08-07T12:36:41+00:00",
"commented_code": "+import { actions, kea, listeners, path, reducers } from 'kea'\n+import { loaders } from 'kea-loaders'\n+import api, { ApiMethodOptions, PaginatedResponse } from 'lib/api'\n+\n+import { ExternalDataSource } from '~/types'\n+\n+import type { externalDataSourcesLogicType } from './externalDataSourcesLogicType'\n+\n+export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n+ path(['scenes', 'data-warehouse', 'externalDataSourcesLogic']),\n+ actions({\n+ abortAnyRunningQuery: true,\n+ }),\n+ loaders(({ cache, values, actions }) => ({\n+ dataWarehouseSources: [\n+ null as PaginatedResponse<ExternalDataSource> | null,\n+ {\n+ loadSources: async (_, breakpoint) => {\n+ await breakpoint(300)\n+ actions.abortAnyRunningQuery()\n+\n+ cache.abortController = new AbortController()\n+ const methodOptions: ApiMethodOptions = {\n+ signal: cache.abortController.signal,\n+ }\n+ const res = await api.externalDataSources.list(methodOptions)\n+ breakpoint()\n+\n+ cache.abortController = null\n+\n+ return res\n+ },\n+ updateSource: async (source: ExternalDataSource) => {\n+ const updatedSource = await api.externalDataSources.update(source.id, source)\n+ return {\n+ ...values.dataWarehouseSources,\n+ results:\n+ values.dataWarehouseSources?.results.map((s) => (s.id === updatedSource.id ? source : s)) ||",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2260189959",
"repo_full_name": "PostHog/posthog",
"pr_number": 36002,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"discussion_id": "2260189959",
"commented_code": "@@ -0,0 +1,63 @@\n+import { actions, kea, listeners, path, reducers } from 'kea'\n+import { loaders } from 'kea-loaders'\n+import api, { ApiMethodOptions, PaginatedResponse } from 'lib/api'\n+\n+import { ExternalDataSource } from '~/types'\n+\n+import type { externalDataSourcesLogicType } from './externalDataSourcesLogicType'\n+\n+export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n+ path(['scenes', 'data-warehouse', 'externalDataSourcesLogic']),\n+ actions({\n+ abortAnyRunningQuery: true,\n+ }),\n+ loaders(({ cache, values, actions }) => ({\n+ dataWarehouseSources: [\n+ null as PaginatedResponse<ExternalDataSource> | null,\n+ {\n+ loadSources: async (_, breakpoint) => {\n+ await breakpoint(300)\n+ actions.abortAnyRunningQuery()\n+\n+ cache.abortController = new AbortController()\n+ const methodOptions: ApiMethodOptions = {\n+ signal: cache.abortController.signal,\n+ }\n+ const res = await api.externalDataSources.list(methodOptions)\n+ breakpoint()\n+\n+ cache.abortController = null\n+\n+ return res\n+ },\n+ updateSource: async (source: ExternalDataSource) => {\n+ const updatedSource = await api.externalDataSources.update(source.id, source)\n+ return {\n+ ...values.dataWarehouseSources,\n+ results:\n+ values.dataWarehouseSources?.results.map((s) => (s.id === updatedSource.id ? source : s)) ||",
"comment_created_at": "2025-08-07T12:36:41+00:00",
"comment_author": "Gilbert09",
"comment_body": "We wanna be returning `updatedSource` here instead of `source` in this ternary ",
"pr_file_module": null
},
{
"comment_id": "2260603508",
"repo_full_name": "PostHog/posthog",
"pr_number": 36002,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"discussion_id": "2260189959",
"commented_code": "@@ -0,0 +1,63 @@\n+import { actions, kea, listeners, path, reducers } from 'kea'\n+import { loaders } from 'kea-loaders'\n+import api, { ApiMethodOptions, PaginatedResponse } from 'lib/api'\n+\n+import { ExternalDataSource } from '~/types'\n+\n+import type { externalDataSourcesLogicType } from './externalDataSourcesLogicType'\n+\n+export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n+ path(['scenes', 'data-warehouse', 'externalDataSourcesLogic']),\n+ actions({\n+ abortAnyRunningQuery: true,\n+ }),\n+ loaders(({ cache, values, actions }) => ({\n+ dataWarehouseSources: [\n+ null as PaginatedResponse<ExternalDataSource> | null,\n+ {\n+ loadSources: async (_, breakpoint) => {\n+ await breakpoint(300)\n+ actions.abortAnyRunningQuery()\n+\n+ cache.abortController = new AbortController()\n+ const methodOptions: ApiMethodOptions = {\n+ signal: cache.abortController.signal,\n+ }\n+ const res = await api.externalDataSources.list(methodOptions)\n+ breakpoint()\n+\n+ cache.abortController = null\n+\n+ return res\n+ },\n+ updateSource: async (source: ExternalDataSource) => {\n+ const updatedSource = await api.externalDataSources.update(source.id, source)\n+ return {\n+ ...values.dataWarehouseSources,\n+ results:\n+ values.dataWarehouseSources?.results.map((s) => (s.id === updatedSource.id ? source : s)) ||",
"comment_created_at": "2025-08-07T15:01:04+00:00",
"comment_author": "naumaanh",
"comment_body": "This was actually one of the things that I think the greptile bot flagged to me before, but I was a bit unsure about it since the actual code in production actually returned `source` in the ternary instead of `updatedSource`. \r\n\r\nWhen I look at the actual object, the only difference I saw was that the ordering of the `schemas` array was different. Everything else looked the same. Would changing this cause any issues? @Gilbert09 ",
"pr_file_module": null
},
{
"comment_id": "2261479527",
"repo_full_name": "PostHog/posthog",
"pr_number": 36002,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"discussion_id": "2260189959",
"commented_code": "@@ -0,0 +1,63 @@\n+import { actions, kea, listeners, path, reducers } from 'kea'\n+import { loaders } from 'kea-loaders'\n+import api, { ApiMethodOptions, PaginatedResponse } from 'lib/api'\n+\n+import { ExternalDataSource } from '~/types'\n+\n+import type { externalDataSourcesLogicType } from './externalDataSourcesLogicType'\n+\n+export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n+ path(['scenes', 'data-warehouse', 'externalDataSourcesLogic']),\n+ actions({\n+ abortAnyRunningQuery: true,\n+ }),\n+ loaders(({ cache, values, actions }) => ({\n+ dataWarehouseSources: [\n+ null as PaginatedResponse<ExternalDataSource> | null,\n+ {\n+ loadSources: async (_, breakpoint) => {\n+ await breakpoint(300)\n+ actions.abortAnyRunningQuery()\n+\n+ cache.abortController = new AbortController()\n+ const methodOptions: ApiMethodOptions = {\n+ signal: cache.abortController.signal,\n+ }\n+ const res = await api.externalDataSources.list(methodOptions)\n+ breakpoint()\n+\n+ cache.abortController = null\n+\n+ return res\n+ },\n+ updateSource: async (source: ExternalDataSource) => {\n+ const updatedSource = await api.externalDataSources.update(source.id, source)\n+ return {\n+ ...values.dataWarehouseSources,\n+ results:\n+ values.dataWarehouseSources?.results.map((s) => (s.id === updatedSource.id ? source : s)) ||",
"comment_created_at": "2025-08-07T21:43:53+00:00",
"comment_author": "Gilbert09",
"comment_body": "Shouldn't do - the return object from an update call is the most up to date version of the object ",
"pr_file_module": null
},
{
"comment_id": "2261502838",
"repo_full_name": "PostHog/posthog",
"pr_number": 36002,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"discussion_id": "2260189959",
"commented_code": "@@ -0,0 +1,63 @@\n+import { actions, kea, listeners, path, reducers } from 'kea'\n+import { loaders } from 'kea-loaders'\n+import api, { ApiMethodOptions, PaginatedResponse } from 'lib/api'\n+\n+import { ExternalDataSource } from '~/types'\n+\n+import type { externalDataSourcesLogicType } from './externalDataSourcesLogicType'\n+\n+export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n+ path(['scenes', 'data-warehouse', 'externalDataSourcesLogic']),\n+ actions({\n+ abortAnyRunningQuery: true,\n+ }),\n+ loaders(({ cache, values, actions }) => ({\n+ dataWarehouseSources: [\n+ null as PaginatedResponse<ExternalDataSource> | null,\n+ {\n+ loadSources: async (_, breakpoint) => {\n+ await breakpoint(300)\n+ actions.abortAnyRunningQuery()\n+\n+ cache.abortController = new AbortController()\n+ const methodOptions: ApiMethodOptions = {\n+ signal: cache.abortController.signal,\n+ }\n+ const res = await api.externalDataSources.list(methodOptions)\n+ breakpoint()\n+\n+ cache.abortController = null\n+\n+ return res\n+ },\n+ updateSource: async (source: ExternalDataSource) => {\n+ const updatedSource = await api.externalDataSources.update(source.id, source)\n+ return {\n+ ...values.dataWarehouseSources,\n+ results:\n+ values.dataWarehouseSources?.results.map((s) => (s.id === updatedSource.id ? source : s)) ||",
"comment_created_at": "2025-08-07T21:51:11+00:00",
"comment_author": "naumaanh",
"comment_body": "Got it, makes sense, I'll adjust this. Thanks! ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,47 @@
---
title: API response standardization
description: Ensure API responses follow established patterns and use proper typing.
Always use standardized response types like `PaginatedResponse` for paginated endpoints,
avoid hardcoded union types for dynamic values that should be strings, and return
fresh data from API responses rather than stale local data.
repository: PostHog/posthog
label: API
language: TypeScript
comments_count: 4
repository_stars: 28460
---
Ensure API responses follow established patterns and use proper typing. Always use standardized response types like `PaginatedResponse` for paginated endpoints, avoid hardcoded union types for dynamic values that should be strings, and return fresh data from API responses rather than stale local data.
Key practices:
- Use established response patterns: `PaginatedResponse<T>` for paginated APIs
- Prefer flexible string types over hardcoded unions for dynamic values (e.g., sync names like "Stripe")
- Return updated data from API responses, not the original request data
- Create reusable type definitions instead of complex inline types
Example:
```typescript
// Bad: Hardcoded union types and inline complex types
async recentActivity(): Promise<{
activities: Array<{
type: 'external_data_sync' | 'materialized_view' // Too rigid
}>
}>
// Good: Standardized response with flexible types
type FunctionActionType = 'function' | 'function_email' | 'function_sms' | 'function_slack' | 'function_webhook';
async recentActivity(): Promise<PaginatedResponse<ActivityItem>> {
// Use string for dynamic sync names like "Stripe", "Salesforce"
type: string // More flexible for dynamic values
}
// Always return fresh API data
const updatedSource = await api.externalDataSources.update(source.id, source)
return {
...values.dataWarehouseSources,
results: values.dataWarehouseSources?.results.map((s) =>
s.id === updatedSource.id ? updatedSource : s // Use fresh data
) || []
}
```

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,45 @@
---
title: break down large functions
description: Large functions that handle multiple responsibilities should be decomposed
into smaller, focused functions to improve readability, maintainability, and facilitate
code reviews.
repository: PostHog/posthog
label: Code Style
language: Python
comments_count: 5
repository_stars: 28460
---
Large functions that handle multiple responsibilities should be decomposed into smaller, focused functions to improve readability, maintainability, and facilitate code reviews.
When a function becomes difficult to understand at a glance or handles multiple distinct operations, consider extracting logical chunks into separate methods. This is especially important when functions exceed ~50 lines or when reviewers comment that the code would benefit from splitting.
**Example of improvement:**
```python
# Before: Large function doing multiple things
def _process_insight_for_evaluation(self, insight: Insight, query_executor: AssistantQueryExecutor) -> dict:
insight_info = self._create_base_insight_info(insight)
try:
query_dict = self._parse_insight_query(insight)
if query_dict:
self._execute_and_update_info(insight_info, query_dict, query_executor)
self._add_visualization_message(insight_info, insight)
else:
self._handle_no_query(insight_info, insight)
except Exception as e:
self._handle_evaluation_error(insight_info, insight, e)
return insight_info
# After: Broken into focused helper methods
def _parse_insight_query(self, insight: Insight) -> dict | None:
# Separate method for query parsing
pass
def _execute_and_update_info(self, insight_info: dict, query_dict: dict, executor):
# Separate method for execution
pass
```
This approach makes each function easier to test, understand, and modify independently. It also makes code reviews more focused since reviewers can evaluate each logical operation separately.

View File

@@ -0,0 +1,232 @@
[
{
"discussion_id": "2277914735",
"pr_number": 36608,
"pr_file": "posthog/hogql/database/database.py",
"created_at": "2025-08-14T23:12:28+00:00",
"commented_code": "with timings.measure(\"data_warehouse_tables\"):\n with timings.measure(\"select\"):\n- tables = list(\n- DataWarehouseTable.objects.filter(team_id=team.pk)\n- .exclude(deleted=True)\n- .select_related(\"credential\", \"external_data_source\")\n- )\n+ if cache_enabled:\n+ tables = list(\n+ DataWarehouseTable.objects.filter(team_id=team.pk)\n+ .exclude(deleted=True)\n+ .select_related(\"credential\", \"external_data_source\")\n+ .fetch_cached(team_id=team_id or team.pk, key_prefix=CACHE_KEY_PREFIX)\n+ )\n+ else:\n+ tables = list(\n+ DataWarehouseTable.objects.filter(team_id=team.pk)\n+ .exclude(deleted=True)\n+ .select_related(\"credential\", \"external_data_source\")\n+ )",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2277914735",
"repo_full_name": "PostHog/posthog",
"pr_number": 36608,
"pr_file": "posthog/hogql/database/database.py",
"discussion_id": "2277914735",
"commented_code": "@@ -541,11 +563,19 @@ def create_hogql_database(\n \n with timings.measure(\"data_warehouse_tables\"):\n with timings.measure(\"select\"):\n- tables = list(\n- DataWarehouseTable.objects.filter(team_id=team.pk)\n- .exclude(deleted=True)\n- .select_related(\"credential\", \"external_data_source\")\n- )\n+ if cache_enabled:\n+ tables = list(\n+ DataWarehouseTable.objects.filter(team_id=team.pk)\n+ .exclude(deleted=True)\n+ .select_related(\"credential\", \"external_data_source\")\n+ .fetch_cached(team_id=team_id or team.pk, key_prefix=CACHE_KEY_PREFIX)\n+ )\n+ else:\n+ tables = list(\n+ DataWarehouseTable.objects.filter(team_id=team.pk)\n+ .exclude(deleted=True)\n+ .select_related(\"credential\", \"external_data_source\")\n+ )",
"comment_created_at": "2025-08-14T23:12:28+00:00",
"comment_author": "rafaeelaudibert",
"comment_body": "And, likewise\n\n\n```suggestion\n tables = DataWarehouseTable.objects.filter(team_id=team.pk)\n .exclude(deleted=True)\n .select_related(\"credential\", \"external_data_source\")\n if cache_enabled:\n tables = tables.fetch_cached(team_id=team_id or team.pk, key_prefix=CACHE_KEY_PREFIX)\n```",
"pr_file_module": null
}
]
},
{
"discussion_id": "2269379019",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"created_at": "2025-08-12T10:14:51+00:00",
"commented_code": "return \"\n\".join(formatted_insights)\n \n- def _parse_insight_ids(self, response_content: str) -> list[int]:\n- \"\"\"Parse insight IDs from LLM response.\"\"\"\n- import re\n+ def _get_all_loaded_insight_ids(self) -> set[int]:\n+ \"\"\"Get all insight IDs from loaded pages.\"\"\"\n+ all_ids = set()\n+ for page_insights in self._loaded_pages.values():\n+ for insight in page_insights:\n+ all_ids.add(insight.id)\n+ return all_ids\n+\n+ def _find_insight_by_id(self, insight_id: int) -> Insight | None:",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2269379019",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2269379019",
"commented_code": "@@ -242,62 +307,254 @@ def _format_insights_page(self, page_number: int) -> str:\n \n return \"\\n\".join(formatted_insights)\n \n- def _parse_insight_ids(self, response_content: str) -> list[int]:\n- \"\"\"Parse insight IDs from LLM response.\"\"\"\n- import re\n+ def _get_all_loaded_insight_ids(self) -> set[int]:\n+ \"\"\"Get all insight IDs from loaded pages.\"\"\"\n+ all_ids = set()\n+ for page_insights in self._loaded_pages.values():\n+ for insight in page_insights:\n+ all_ids.add(insight.id)\n+ return all_ids\n+\n+ def _find_insight_by_id(self, insight_id: int) -> Insight | None:",
"comment_created_at": "2025-08-12T10:14:51+00:00",
"comment_author": "sortafreel",
"comment_body": "It seems like O(n2) complexity and is being called multiple times. Would it make sense to cache it somehow, or are the loaded pages too dynamic? Or maybe just build the `self` index dictionary on the go? With thousands of insights it can get a bit expensive, as I see it.",
"pr_file_module": null
},
{
"comment_id": "2273402677",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2269379019",
"commented_code": "@@ -242,62 +307,254 @@ def _format_insights_page(self, page_number: int) -> str:\n \n return \"\\n\".join(formatted_insights)\n \n- def _parse_insight_ids(self, response_content: str) -> list[int]:\n- \"\"\"Parse insight IDs from LLM response.\"\"\"\n- import re\n+ def _get_all_loaded_insight_ids(self) -> set[int]:\n+ \"\"\"Get all insight IDs from loaded pages.\"\"\"\n+ all_ids = set()\n+ for page_insights in self._loaded_pages.values():\n+ for insight in page_insights:\n+ all_ids.add(insight.id)\n+ return all_ids\n+\n+ def _find_insight_by_id(self, insight_id: int) -> Insight | None:",
"comment_created_at": "2025-08-13T13:03:30+00:00",
"comment_author": "tatoalo",
"comment_body": "Yeah good point! Added a super simple insight ID caching mechanism.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2277698056",
"pr_number": 36663,
"pr_file": "posthog/hogql_queries/query_runner.py",
"created_at": "2025-08-14T20:50:58+00:00",
"commented_code": "self.__post_init__()\n \n \n-class QueryRunnerWithHogQLContext(QueryRunner):\n+# Type constraint for analytics query responses\n+AR = TypeVar(\"AR\", bound=AnalyticsQueryResponseBase)\n+\n+\n+class AnalyticsQueryRunner(QueryRunner[Q, AR, CR], Generic[Q, AR, CR]):\n+ \"\"\"\n+ QueryRunner subclass that constrains the response type to AnalyticsQueryResponseBase.\n+ \"\"\"\n+\n+ def calculate(self) -> AR:\n+ response = self._calculate()\n+ if not self.modifiers.timings:",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2277698056",
"repo_full_name": "PostHog/posthog",
"pr_number": 36663,
"pr_file": "posthog/hogql_queries/query_runner.py",
"discussion_id": "2277698056",
"commented_code": "@@ -1134,7 +1138,23 @@ def apply_dashboard_filters(self, dashboard_filter: DashboardFilter):\n self.__post_init__()\n \n \n-class QueryRunnerWithHogQLContext(QueryRunner):\n+# Type constraint for analytics query responses\n+AR = TypeVar(\"AR\", bound=AnalyticsQueryResponseBase)\n+\n+\n+class AnalyticsQueryRunner(QueryRunner[Q, AR, CR], Generic[Q, AR, CR]):\n+ \"\"\"\n+ QueryRunner subclass that constrains the response type to AnalyticsQueryResponseBase.\n+ \"\"\"\n+\n+ def calculate(self) -> AR:\n+ response = self._calculate()\n+ if not self.modifiers.timings:",
"comment_created_at": "2025-08-14T20:50:58+00:00",
"comment_author": "rafaeelaudibert",
"comment_body": "We could do this differently by updating `HogQLTimings` to be a noop when this isnt set, avoid spending time timing stuff",
"pr_file_module": null
},
{
"comment_id": "2277738469",
"repo_full_name": "PostHog/posthog",
"pr_number": 36663,
"pr_file": "posthog/hogql_queries/query_runner.py",
"discussion_id": "2277698056",
"commented_code": "@@ -1134,7 +1138,23 @@ def apply_dashboard_filters(self, dashboard_filter: DashboardFilter):\n self.__post_init__()\n \n \n-class QueryRunnerWithHogQLContext(QueryRunner):\n+# Type constraint for analytics query responses\n+AR = TypeVar(\"AR\", bound=AnalyticsQueryResponseBase)\n+\n+\n+class AnalyticsQueryRunner(QueryRunner[Q, AR, CR], Generic[Q, AR, CR]):\n+ \"\"\"\n+ QueryRunner subclass that constrains the response type to AnalyticsQueryResponseBase.\n+ \"\"\"\n+\n+ def calculate(self) -> AR:\n+ response = self._calculate()\n+ if not self.modifiers.timings:",
"comment_created_at": "2025-08-14T21:16:19+00:00",
"comment_author": "aspicer",
"comment_body": "This was my original implementation, but @Gilbert09 suggested that it might be confusing and lead to code issues. The timings aren't that computationally intensive, so this PR just removes them at the proper layer (the query runner layer). \r\n\r\nSee discussion here: https://github.com/PostHog/posthog/pull/36600",
"pr_file_module": null
},
{
"comment_id": "2277758429",
"repo_full_name": "PostHog/posthog",
"pr_number": 36663,
"pr_file": "posthog/hogql_queries/query_runner.py",
"discussion_id": "2277698056",
"commented_code": "@@ -1134,7 +1138,23 @@ def apply_dashboard_filters(self, dashboard_filter: DashboardFilter):\n self.__post_init__()\n \n \n-class QueryRunnerWithHogQLContext(QueryRunner):\n+# Type constraint for analytics query responses\n+AR = TypeVar(\"AR\", bound=AnalyticsQueryResponseBase)\n+\n+\n+class AnalyticsQueryRunner(QueryRunner[Q, AR, CR], Generic[Q, AR, CR]):\n+ \"\"\"\n+ QueryRunner subclass that constrains the response type to AnalyticsQueryResponseBase.\n+ \"\"\"\n+\n+ def calculate(self) -> AR:\n+ response = self._calculate()\n+ if not self.modifiers.timings:",
"comment_created_at": "2025-08-14T21:29:57+00:00",
"comment_author": "rafaeelaudibert",
"comment_body": "Thanks for the extra context, I agree with the approach :)",
"pr_file_module": null
}
]
},
{
"discussion_id": "2276175419",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"created_at": "2025-08-14T10:02:41+00:00",
"commented_code": "from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2276175419",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T10:02:41+00:00",
"comment_author": "Gilbert09",
"comment_body": "I don't think having an implicit no-op `HogQLTimings` is a good idea - we're just asking for confusion and bugs with this - \"why are timings not working, I'm using `HogQLTimings`\". I think I'd prefer we kept the implementation as-is, but update the `query.py` file to conditionally set timings to `None` instead when `debug` is not True",
"pr_file_module": null
},
{
"comment_id": "2277205963",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T17:00:37+00:00",
"comment_author": "aspicer",
"comment_body": "The issue here is that timings runs across the stack. It is implemented as a hogql modifier, but multiple things touch timings.\r\n\r\nTimings is generally instantiated in the init of query_runner.py\r\n\r\nFrom here, the code usually calls the calculate() or run() or to_query() method on a subclass of query_runner.\r\n\r\nThe query_runner doesn't have a return type bound other than BaseModel. You could theoretically use it for anything. So it doesn't necessarily have a hogql field or a timings field.\r\n\r\nSo the question is where to remove the timings data?\r\n\r\nSince it's a hogql modifier, you could remove the hogql and the timings return value in execute_hogql_query, but then sometimes the actual query_runner one level higher (see actors_query_runner) adds things to it.\r\n\r\nHow can I stop these various query runners from returning timings from their calculate methods without having to make every query runner have logic in it that handles the modifier case of debug?\r\n\r\nWe could do an `if hasattr delattr` thing somewhere in process_query_model but that seems hackier. Open to ideas.",
"pr_file_module": null
},
{
"comment_id": "2277234879",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T17:13:17+00:00",
"comment_author": "Gilbert09",
"comment_body": "To be fair, I probably wouldn't tie this change to the `DEBUG` modifier - it's very non-descript for what actually happens. I'd probably add a timings modifier and base everything off that. \r\n\r\n> So the question is where to remove the timings data?\r\n\r\nHonestly, at the top level seems sensible to me - if that means we have to modify every query runner to handle the case of a missing `timings` object, then that's an appropriate approach imo. Query Runner timings != HogQL timings - we just happen to merge the hogql timings into the query runner timings. Do you want to remove only the hogql timings from the query results, or both hogql and query runner? Because the answer to that should help with where the logic goes",
"pr_file_module": null
},
{
"comment_id": "2277263915",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T17:23:12+00:00",
"comment_author": "aspicer",
"comment_body": "Yes, we don't actually care what happens in HogQL directly from the client side. \r\nWe want to remove timings from query runner calculate calls.\r\n\r\nI guess this could be done by refactoring all query runners that return things that inherit their return from AnalyticsQueryResponseBase and handling it that way without having to make the query runners completely aware of it. I'll look into it. Thanks for the feedback.",
"pr_file_module": null
},
{
"comment_id": "2277420989",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T18:25:29+00:00",
"comment_author": "Gilbert09",
"comment_body": "Thinking about this some more on my commute home. Why don't we just remove timings from the response object at the API layer? This is more of an API concern, right?",
"pr_file_module": null
},
{
"comment_id": "2277458679",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T18:44:44+00:00",
"comment_author": "aspicer",
"comment_body": "That makes sense but two issues:\r\n1. The original reason for doing this is to avoid caching timings, so it has to be at the query_runner level at least\r\n2. The query endpoint allows for any response type, only some of which are analytics queries that have the timings field.\r\n\r\nI think pushing it to the query_runner layer is the right call here, it's relatively straight forward and I'm handling it now. A lot cleaner with no changes to hogql or the timings object. ",
"pr_file_module": null
},
{
"comment_id": "2277460770",
"repo_full_name": "PostHog/posthog",
"pr_number": 36600,
"pr_file": "posthog/hogql/timings.py",
"discussion_id": "2276175419",
"commented_code": "@@ -1,12 +1,36 @@\n from time import perf_counter\n from contextlib import contextmanager\n+from collections.abc import Iterator\n \n from posthog.schema import QueryTiming\n \n \n+class HogQLTimings:\n+ \"\"\"No-op version of HogQLTimings that doesn't collect timing data.\"\"\"",
"comment_created_at": "2025-08-14T18:45:33+00:00",
"comment_author": "Gilbert09",
"comment_body": "Great, thank you!",
"pr_file_module": null
}
]
},
{
"discussion_id": "2245583189",
"pr_number": 35957,
"pr_file": "products/batch_exports/backend/temporal/metrics.py",
"created_at": "2025-07-31T14:45:30+00:00",
"commented_code": "\"interval\": interval,\n }\n \n- activity_attempt = activity_info.attempt\n meter = get_metric_meter(histogram_attributes)\n- hist = meter.create_histogram(\n- name=\"batch_exports_activity_attempt\",\n- description=\"Histogram tracking attempts made by critical batch export activities\",\n+\n+ try:\n+ with ExecutionTimeRecorder(\n+ \"batch_exports_activity_interval_execution_latency\",\n+ description=\"Histogram tracking execution latency for critical batch export activities by interval\",\n+ histogram_attributes=histogram_attributes,\n+ log=False,\n+ ):\n+ result = await super().execute_activity(input)\n+ finally:\n+ attempts_total_counter = meter.create_counter(\n+ name=\"batch_exports_activity_attempts\",\n+ description=\"Counter tracking every attempt at running an activity\",\n+ )\n+ attempts_total_counter.add(1)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2245611026",
"repo_full_name": "PostHog/posthog",
"pr_number": 35957,
"pr_file": "products/batch_exports/backend/temporal/metrics.py",
"discussion_id": "2245583189",
"commented_code": "@@ -94,21 +94,30 @@ async def execute_activity(self, input: ExecuteActivityInput) -> typing.Any:\n \"interval\": interval,\n }\n \n- activity_attempt = activity_info.attempt\n meter = get_metric_meter(histogram_attributes)\n- hist = meter.create_histogram(\n- name=\"batch_exports_activity_attempt\",\n- description=\"Histogram tracking attempts made by critical batch export activities\",\n+\n+ try:\n+ with ExecutionTimeRecorder(\n+ \"batch_exports_activity_interval_execution_latency\",\n+ description=\"Histogram tracking execution latency for critical batch export activities by interval\",\n+ histogram_attributes=histogram_attributes,\n+ log=False,\n+ ):\n+ result = await super().execute_activity(input)\n+ finally:\n+ attempts_total_counter = meter.create_counter(\n+ name=\"batch_exports_activity_attempts\",\n+ description=\"Counter tracking every attempt at running an activity\",\n+ )\n+ attempts_total_counter.add(1)",
"comment_created_at": "2025-07-31T14:45:30+00:00",
"comment_author": "tomasfarias",
"comment_body": "hmm we can cache it if it becomes a problem, otherwise we have to keep it here as we don't have access to the attributes outside of this context.",
"pr_file_module": null
},
{
"comment_id": "2245628459",
"repo_full_name": "PostHog/posthog",
"pr_number": 35957,
"pr_file": "products/batch_exports/backend/temporal/metrics.py",
"discussion_id": "2245583189",
"commented_code": "@@ -94,21 +94,30 @@ async def execute_activity(self, input: ExecuteActivityInput) -> typing.Any:\n \"interval\": interval,\n }\n \n- activity_attempt = activity_info.attempt\n meter = get_metric_meter(histogram_attributes)\n- hist = meter.create_histogram(\n- name=\"batch_exports_activity_attempt\",\n- description=\"Histogram tracking attempts made by critical batch export activities\",\n+\n+ try:\n+ with ExecutionTimeRecorder(\n+ \"batch_exports_activity_interval_execution_latency\",\n+ description=\"Histogram tracking execution latency for critical batch export activities by interval\",\n+ histogram_attributes=histogram_attributes,\n+ log=False,\n+ ):\n+ result = await super().execute_activity(input)\n+ finally:\n+ attempts_total_counter = meter.create_counter(\n+ name=\"batch_exports_activity_attempts\",\n+ description=\"Counter tracking every attempt at running an activity\",\n+ )\n+ attempts_total_counter.add(1)",
"comment_created_at": "2025-07-31T14:52:08+00:00",
"comment_author": "tomasfarias",
"comment_body": "we'll monitor and decide later if this is a problem. `ExecutionTimeCounter` does the same and we haven't noticed impact.",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,59 @@
---
title: Cache expensive operations
description: Identify and eliminate redundant expensive operations by implementing
caching, memoization, or conditional execution. Look for repeated database queries,
complex calculations, object creation, or data processing that can be cached or
avoided entirely when not needed.
repository: PostHog/posthog
label: Performance Optimization
language: Python
comments_count: 5
repository_stars: 28460
---
Identify and eliminate redundant expensive operations by implementing caching, memoization, or conditional execution. Look for repeated database queries, complex calculations, object creation, or data processing that can be cached or avoided entirely when not needed.
Common patterns to optimize:
- **Repeated database queries**: Extract common query logic and cache results
- **Expensive lookups**: Cache lookup results to avoid O(n²) complexity when searching through collections multiple times
- **Conditional expensive operations**: Skip unnecessary work when results won't be used (e.g., timing calculations when debug mode is off)
- **Object recreation**: Cache expensive-to-create objects like compiled regex patterns or metric meters
Example of query optimization:
```python
# Before: Duplicated query logic
if cache_enabled:
tables = list(
DataWarehouseTable.objects.filter(team_id=team.pk)
.exclude(deleted=True)
.select_related("credential", "external_data_source")
.fetch_cached(team_id=team_id or team.pk, key_prefix=CACHE_KEY_PREFIX)
)
else:
tables = list(
DataWarehouseTable.objects.filter(team_id=team.pk)
.exclude(deleted=True)
.select_related("credential", "external_data_source")
)
# After: Extract common logic, apply caching conditionally
tables = DataWarehouseTable.objects.filter(team_id=team.pk)
.exclude(deleted=True)
.select_related("credential", "external_data_source")
if cache_enabled:
tables = tables.fetch_cached(team_id=team_id or team.pk, key_prefix=CACHE_KEY_PREFIX)
```
Example of lookup caching:
```python
# Add caching to avoid O(n²) complexity
def _get_all_loaded_insight_ids(self) -> set[int]:
"""Get all insight IDs from loaded pages."""
if not hasattr(self, '_cached_insight_ids'):
all_ids = set()
for page_insights in self._loaded_pages.values():
for insight in page_insights:
all_ids.add(insight.id)
self._cached_insight_ids = all_ids
return self._cached_insight_ids
```

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,39 @@
---
title: Cache invalidation consistency
description: Ensure comprehensive and consistent cache invalidation patterns across
all models that affect cached data. Every model that can impact cached content must
have proper invalidation signals, and cache keys should be designed to enable targeted
busting without affecting unrelated data.
repository: PostHog/posthog
label: Caching
language: Python
comments_count: 6
repository_stars: 28460
---
Ensure comprehensive and consistent cache invalidation patterns across all models that affect cached data. Every model that can impact cached content must have proper invalidation signals, and cache keys should be designed to enable targeted busting without affecting unrelated data.
Key requirements:
1. **Complete signal coverage**: Add both `post_save` and `post_delete` receivers for all models that affect cached data, even if soft deletes are primarily used
2. **Consistent invalidation patterns**: Don't rely on developers to remember to add cache busting - make it systematic and hard to miss
3. **Targeted cache keys**: Use Redis hashes with structured keys like `team_id:git_sha:model_name` to enable selective invalidation without clearing unrelated cache entries
4. **Handle related model changes**: Consider how foreign key and many-to-many relationships affect cached data and ensure those changes also trigger appropriate invalidation
Example implementation:
```python
# Bad - easy to forget invalidation for new models
class ExternalDataSource(models.Model):
objects: CacheManager = CacheManager()
# Missing: no invalidation signals
# Good - systematic invalidation pattern
class ExternalDataSource(models.Model):
objects: CacheManager = CacheManager()
@receiver(post_save, sender=ExternalDataSource)
@receiver(post_delete, sender=ExternalDataSource) # Even with soft deletes
def invalidate_external_data_source_cache(sender, instance, **kwargs):
ExternalDataSource.objects.invalidate_cache(instance.team_id)
```
This prevents the common issue where "ORM writes (including bulk updates, signals, admin edits, scripts) can bypass your invalidation path → stale cache" and ensures that cache invalidation is not an afterthought that developers can easily miss when adding new cached models.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,34 @@
---
title: Capture broad exceptions
description: When using broad exception handlers like `except Exception:`, always
capture and log the exception to avoid silent failures that are difficult to debug
in production. Broad exception handling without proper error capture masks underlying
issues and makes troubleshooting nearly impossible.
repository: PostHog/posthog
label: Error Handling
language: Python
comments_count: 4
repository_stars: 28460
---
When using broad exception handlers like `except Exception:`, always capture and log the exception to avoid silent failures that are difficult to debug in production. Broad exception handling without proper error capture masks underlying issues and makes troubleshooting nearly impossible.
**Bad:**
```python
try:
verify_github_signature(request.body.decode("utf-8"), kid, sig)
except Exception:
# Silent failure - no way to debug what went wrong
break
```
**Good:**
```python
try:
verify_github_signature(request.body.decode("utf-8"), kid, sig)
except Exception as e:
capture_exception(e) # or logger.exception()
break
```
This practice is especially critical in complex functions with nested operations where multiple failure points exist. Even when you intend to handle errors gracefully, capturing the exception provides valuable debugging information without changing the error handling behavior. The goal is to maintain your intended error recovery while ensuring production issues can be diagnosed and fixed.

View File

@@ -0,0 +1,114 @@
[
{
"discussion_id": "2272705859",
"pr_number": 36480,
"pr_file": "posthog/hogql/database/join_functions.py",
"created_at": "2025-08-13T09:30:42+00:00",
"commented_code": "+from collections.abc import Callable\n+from typing import Any, Literal, Optional, TypeVar, overload, cast\n+from pydantic import BaseModel\n+\n+\n+class LazyJoinFunctionSerialConfig(BaseModel):\n+ type: Literal[\"join_function\"] = \"join_function\"\n+ name: str\n+\n+\n+class LazyJoinClosureSerialConfig(BaseModel):\n+ type: Literal[\"closure\"] = \"closure\"\n+ name: str\n+ args: tuple[Any, ...]\n+\n+\n+REGISTERED_JOIN_FUNCTIONS: dict[str, Callable] = {}\n+\n+\n+REGISTERED_JOIN_CLOSURES: dict[str, Callable] = {}\n+\n+_F = TypeVar(\"_F\", bound=Callable)\n+\n+\n+@overload\n+def register_join_function(_func: _F) -> _F: ...\n+\n+\n+@overload\n+def register_join_function(*, name: Optional[str] = ..., closure: bool = ...) -> Callable[[_F], _F]: ...\n+\n+\n+def register_join_function(_func: Optional[_F] = None, *, name: Optional[str] = None, closure: bool = False):\n+ \"\"\"\n+ Decorator to register a join function in the allowlist.\n+\n+ Usage:\n+ - @register_join_function\n+ - @register_join_function()\n+ - @register_join_function(name=\"custom_name\")\n+ - @register_join_function(closure=True) # for factory functions returning a join callable\n+ \"\"\"\n+\n+ def _decorator(func: _F) -> _F:\n+ key = name or cast(str, getattr(func, \"__name__\", \"\"))\n+ if closure:\n+ REGISTERED_JOIN_CLOSURES[key] = func\n+ else:\n+ REGISTERED_JOIN_FUNCTIONS[key] = func",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2272705859",
"repo_full_name": "PostHog/posthog",
"pr_number": 36480,
"pr_file": "posthog/hogql/database/join_functions.py",
"discussion_id": "2272705859",
"commented_code": "@@ -0,0 +1,63 @@\n+from collections.abc import Callable\n+from typing import Any, Literal, Optional, TypeVar, overload, cast\n+from pydantic import BaseModel\n+\n+\n+class LazyJoinFunctionSerialConfig(BaseModel):\n+ type: Literal[\"join_function\"] = \"join_function\"\n+ name: str\n+\n+\n+class LazyJoinClosureSerialConfig(BaseModel):\n+ type: Literal[\"closure\"] = \"closure\"\n+ name: str\n+ args: tuple[Any, ...]\n+\n+\n+REGISTERED_JOIN_FUNCTIONS: dict[str, Callable] = {}\n+\n+\n+REGISTERED_JOIN_CLOSURES: dict[str, Callable] = {}\n+\n+_F = TypeVar(\"_F\", bound=Callable)\n+\n+\n+@overload\n+def register_join_function(_func: _F) -> _F: ...\n+\n+\n+@overload\n+def register_join_function(*, name: Optional[str] = ..., closure: bool = ...) -> Callable[[_F], _F]: ...\n+\n+\n+def register_join_function(_func: Optional[_F] = None, *, name: Optional[str] = None, closure: bool = False):\n+ \"\"\"\n+ Decorator to register a join function in the allowlist.\n+\n+ Usage:\n+ - @register_join_function\n+ - @register_join_function()\n+ - @register_join_function(name=\"custom_name\")\n+ - @register_join_function(closure=True) # for factory functions returning a join callable\n+ \"\"\"\n+\n+ def _decorator(func: _F) -> _F:\n+ key = name or cast(str, getattr(func, \"__name__\", \"\"))\n+ if closure:\n+ REGISTERED_JOIN_CLOSURES[key] = func\n+ else:\n+ REGISTERED_JOIN_FUNCTIONS[key] = func",
"comment_created_at": "2025-08-13T09:30:42+00:00",
"comment_author": "Gilbert09",
"comment_body": "Can we also check if the `key` already exists, if so, raise an error - we should never have overlapping keys here",
"pr_file_module": null
}
]
},
{
"discussion_id": "2275844503",
"pr_number": 36562,
"pr_file": "ee/hogai/eval/schema.py",
"created_at": "2025-08-14T08:17:34+00:00",
"commented_code": "+import json\n+from abc import ABC, abstractmethod\n+from collections.abc import Generator, Sequence\n+from typing import Generic, Self, TypeVar\n+\n+from django.db.models import Model\n+from pydantic import BaseModel\n+from pydantic_avro import AvroBase\n+\n+from posthog.models import (\n+ DataWarehouseTable,\n+ GroupTypeMapping,\n+ PropertyDefinition,\n+ Team,\n+)\n+\n+T = TypeVar(\"T\", bound=Model)\n+\n+\n+class BaseSnapshot(AvroBase, ABC, Generic[T]):\n+ @classmethod\n+ @abstractmethod\n+ def serialize_for_project(cls, project_id: int) -> Generator[Self, None, None]:\n+ raise NotImplementedError\n+\n+ @classmethod\n+ @abstractmethod\n+ def deserialize_for_project(\n+ cls, project_id: int, models: Sequence[Self], *, team_id: int\n+ ) -> Generator[T, None, None]:\n+ raise NotImplementedError\n+\n+\n+# posthog/models/team.py\n+class TeamSnapshot(BaseSnapshot[Team]):\n+ name: str\n+ test_account_filters: str\n+\n+ @classmethod\n+ def serialize_for_project(cls, project_id: int):\n+ team = Team.objects.get(pk=project_id)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2275844503",
"repo_full_name": "PostHog/posthog",
"pr_number": 36562,
"pr_file": "ee/hogai/eval/schema.py",
"discussion_id": "2275844503",
"commented_code": "@@ -0,0 +1,143 @@\n+import json\n+from abc import ABC, abstractmethod\n+from collections.abc import Generator, Sequence\n+from typing import Generic, Self, TypeVar\n+\n+from django.db.models import Model\n+from pydantic import BaseModel\n+from pydantic_avro import AvroBase\n+\n+from posthog.models import (\n+ DataWarehouseTable,\n+ GroupTypeMapping,\n+ PropertyDefinition,\n+ Team,\n+)\n+\n+T = TypeVar(\"T\", bound=Model)\n+\n+\n+class BaseSnapshot(AvroBase, ABC, Generic[T]):\n+ @classmethod\n+ @abstractmethod\n+ def serialize_for_project(cls, project_id: int) -> Generator[Self, None, None]:\n+ raise NotImplementedError\n+\n+ @classmethod\n+ @abstractmethod\n+ def deserialize_for_project(\n+ cls, project_id: int, models: Sequence[Self], *, team_id: int\n+ ) -> Generator[T, None, None]:\n+ raise NotImplementedError\n+\n+\n+# posthog/models/team.py\n+class TeamSnapshot(BaseSnapshot[Team]):\n+ name: str\n+ test_account_filters: str\n+\n+ @classmethod\n+ def serialize_for_project(cls, project_id: int):\n+ team = Team.objects.get(pk=project_id)",
"comment_created_at": "2025-08-14T08:17:34+00:00",
"comment_author": "sortafreel",
"comment_body": "Same-ish question here - can we always guarantee that project_id exists? I assume the serialization happens right away, so there's not enough time for anything to happen in between.\r\n\r\nHowever, from my understanding, they are refreshed twice monthly, so the question stands :)",
"pr_file_module": null
},
{
"comment_id": "2275899976",
"repo_full_name": "PostHog/posthog",
"pr_number": 36562,
"pr_file": "ee/hogai/eval/schema.py",
"discussion_id": "2275844503",
"commented_code": "@@ -0,0 +1,143 @@\n+import json\n+from abc import ABC, abstractmethod\n+from collections.abc import Generator, Sequence\n+from typing import Generic, Self, TypeVar\n+\n+from django.db.models import Model\n+from pydantic import BaseModel\n+from pydantic_avro import AvroBase\n+\n+from posthog.models import (\n+ DataWarehouseTable,\n+ GroupTypeMapping,\n+ PropertyDefinition,\n+ Team,\n+)\n+\n+T = TypeVar(\"T\", bound=Model)\n+\n+\n+class BaseSnapshot(AvroBase, ABC, Generic[T]):\n+ @classmethod\n+ @abstractmethod\n+ def serialize_for_project(cls, project_id: int) -> Generator[Self, None, None]:\n+ raise NotImplementedError\n+\n+ @classmethod\n+ @abstractmethod\n+ def deserialize_for_project(\n+ cls, project_id: int, models: Sequence[Self], *, team_id: int\n+ ) -> Generator[T, None, None]:\n+ raise NotImplementedError\n+\n+\n+# posthog/models/team.py\n+class TeamSnapshot(BaseSnapshot[Team]):\n+ name: str\n+ test_account_filters: str\n+\n+ @classmethod\n+ def serialize_for_project(cls, project_id: int):\n+ team = Team.objects.get(pk=project_id)",
"comment_created_at": "2025-08-14T08:31:52+00:00",
"comment_author": "skoob13",
"comment_body": "Since the input is going to be a dataset, we don't want to continue an evaluation run if the dataset contains incorrect data. The idea here is to run for every record or fail altogether.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2257685078",
"pr_number": 36258,
"pr_file": "posthog/tasks/email.py",
"created_at": "2025-08-06T16:18:04+00:00",
"commented_code": "# Build the dictionaries from the optimized result set\n for activity in latest_activities:\n- if activity.user:\n- last_editors[activity.item_id] = activity.user.email\n- last_edit_dates[activity.item_id] = activity.created_at.strftime(\"%Y-%m-%d\")\n- else:\n- last_editors[activity.item_id] = None\n- last_edit_dates[activity.item_id] = None\n+ if activity.item_id is not None: # Ensure item_id is not None before using as dict key",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2257685078",
"repo_full_name": "PostHog/posthog",
"pr_number": 36258,
"pr_file": "posthog/tasks/email.py",
"discussion_id": "2257685078",
"commented_code": "@@ -728,12 +729,13 @@ def send_team_hog_functions_digest(team_id: int, hog_function_ids: list[str] | N\n \n # Build the dictionaries from the optimized result set\n for activity in latest_activities:\n- if activity.user:\n- last_editors[activity.item_id] = activity.user.email\n- last_edit_dates[activity.item_id] = activity.created_at.strftime(\"%Y-%m-%d\")\n- else:\n- last_editors[activity.item_id] = None\n- last_edit_dates[activity.item_id] = None\n+ if activity.item_id is not None: # Ensure item_id is not None before using as dict key",
"comment_created_at": "2025-08-06T16:18:04+00:00",
"comment_author": "Twixes",
"comment_body": "Nit: Typically clearer to do the less-nested `if activity.item_id is None: continue` in cases like this ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2251121522",
"pr_number": 36089,
"pr_file": "ee/hogai/assistant.py",
"created_at": "2025-08-04T10:50:48+00:00",
"commented_code": "def _chunk_reasoning_headline(self, reasoning: dict[str, Any]) -> Optional[str]:\n \"\"\"Process a chunk of OpenAI `reasoning`, and if a new headline was just finalized, return it.\"\"\"\n try:\n- summary_text_chunk = reasoning[\"summary\"][0][\"text\"]\n- except (KeyError, IndexError) as e:\n- capture_exception(e)\n- self._reasoning_headline_chunk = None # not expected, so let's just reset\n+ if summary := reasoning.get(\"summary\"):\n+ summary_text_chunk = summary[0][\"text\"]",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2251121522",
"repo_full_name": "PostHog/posthog",
"pr_number": 36089,
"pr_file": "ee/hogai/assistant.py",
"discussion_id": "2251121522",
"commented_code": "@@ -546,10 +546,15 @@ def _process_memory_initializer_chunk(self, langchain_message: AIMessageChunk) -\n def _chunk_reasoning_headline(self, reasoning: dict[str, Any]) -> Optional[str]:\n \"\"\"Process a chunk of OpenAI `reasoning`, and if a new headline was just finalized, return it.\"\"\"\n try:\n- summary_text_chunk = reasoning[\"summary\"][0][\"text\"]\n- except (KeyError, IndexError) as e:\n- capture_exception(e)\n- self._reasoning_headline_chunk = None # not expected, so let's just reset\n+ if summary := reasoning.get(\"summary\"):\n+ summary_text_chunk = summary[0][\"text\"]",
"comment_created_at": "2025-08-04T10:50:48+00:00",
"comment_author": "sortafreel",
"comment_body": "`IndexError` seems possible still, as you access `summary[0]`?",
"pr_file_module": null
},
{
"comment_id": "2251594412",
"repo_full_name": "PostHog/posthog",
"pr_number": 36089,
"pr_file": "ee/hogai/assistant.py",
"discussion_id": "2251121522",
"commented_code": "@@ -546,10 +546,15 @@ def _process_memory_initializer_chunk(self, langchain_message: AIMessageChunk) -\n def _chunk_reasoning_headline(self, reasoning: dict[str, Any]) -> Optional[str]:\n \"\"\"Process a chunk of OpenAI `reasoning`, and if a new headline was just finalized, return it.\"\"\"\n try:\n- summary_text_chunk = reasoning[\"summary\"][0][\"text\"]\n- except (KeyError, IndexError) as e:\n- capture_exception(e)\n- self._reasoning_headline_chunk = None # not expected, so let's just reset\n+ if summary := reasoning.get(\"summary\"):\n+ summary_text_chunk = summary[0][\"text\"]",
"comment_created_at": "2025-08-04T14:00:50+00:00",
"comment_author": "Twixes",
"comment_body": "Actually not, because the walrus operator checks for _truthiness_, not non-null",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,39 @@
---
title: Check existence before operations
description: Always verify that keys, IDs, indices, or other required values exist
before performing operations that depend on them. This prevents runtime errors and
unexpected behavior from null references or missing data.
repository: PostHog/posthog
label: Null Handling
language: Python
comments_count: 4
repository_stars: 28460
---
Always verify that keys, IDs, indices, or other required values exist before performing operations that depend on them. This prevents runtime errors and unexpected behavior from null references or missing data.
Key patterns to follow:
- Check if dictionary keys exist before registration or access
- Validate that database IDs exist before queries
- Verify array indices are within bounds before access
- Use early returns for cleaner null handling: `if value is None: continue`
Example of the pattern:
```python
# Before: Potential KeyError or duplicate registration
REGISTERED_FUNCTIONS[key] = func
# After: Check existence first
if key in REGISTERED_FUNCTIONS:
raise ValueError(f"Function {key} already registered")
REGISTERED_FUNCTIONS[key] = func
# Before: Potential IndexError
summary_text_chunk = summary[0]["text"]
# After: Check bounds and existence
if summary and len(summary) > 0:
summary_text_chunk = summary[0]["text"]
```
This proactive approach prevents silent failures and makes code more robust by catching potential issues at the point of access rather than allowing them to propagate.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,51 @@
---
title: Configuration constants management
description: Extract configuration values into well-named constants instead of using
magic numbers or inline values. Use consistent naming patterns across environments
and organize configuration values in a maintainable way.
repository: PostHog/posthog
label: Configurations
language: Python
comments_count: 4
repository_stars: 28460
---
Extract configuration values into well-named constants instead of using magic numbers or inline values. Use consistent naming patterns across environments and organize configuration values in a maintainable way.
**Why this matters:**
- Magic numbers scattered throughout code are hard to maintain and understand
- Inconsistent naming across environments leads to confusion and errors
- Centralized configuration makes it easier to modify behavior without hunting through code
**How to apply:**
1. Replace magic numbers with descriptive constants
2. Use consistent naming patterns across all environments
3. Group related configuration values together
4. Make non-parameterized values into module-level constants
**Example:**
```python
# Bad - magic numbers and inconsistent naming
def paginate_results(self):
self._page_size = 50
max_pages = 6
timeout = 180
if storage_policy == "s3":
policy = "s3_policy" # Different naming in other envs
# Good - named constants with consistent patterns
DEFAULT_PAGE_SIZE = 50
MAX_PAGES_LIMIT = 6
REQUEST_TIMEOUT_SECONDS = 180
S3_STORAGE_POLICY = "s3_backed" # Same across all environments
BASE_ERROR_INSTRUCTIONS = "Tell the user that you encountered an issue..."
def paginate_results(self):
self._page_size = DEFAULT_PAGE_SIZE
max_pages = MAX_PAGES_LIMIT
timeout = REQUEST_TIMEOUT_SECONDS
```
This approach makes configuration changes safer, more discoverable, and reduces the risk of environment-specific bugs.

View File

@@ -0,0 +1,58 @@
[
{
"discussion_id": "2283806422",
"pr_number": 36794,
"pr_file": "frontend/src/scenes/settings/SettingsMap.tsx",
"created_at": "2025-08-19T01:19:01+00:00",
"commented_code": "component: <PreAggregatedTablesSetting />,\n flag: 'SETTINGS_WEB_ANALYTICS_PRE_AGGREGATED_TABLES',\n },\n+ {\n+ id: 'web-analytics-opt-int-pre-aggregated-tables-and-api',\n+ title: 'New query engine',",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2283806422",
"repo_full_name": "PostHog/posthog",
"pr_number": 36794,
"pr_file": "frontend/src/scenes/settings/SettingsMap.tsx",
"discussion_id": "2283806422",
"commented_code": "@@ -335,6 +336,12 @@ export const SETTINGS_MAP: SettingSection[] = [\n component: <PreAggregatedTablesSetting />,\n flag: 'SETTINGS_WEB_ANALYTICS_PRE_AGGREGATED_TABLES',\n },\n+ {\n+ id: 'web-analytics-opt-int-pre-aggregated-tables-and-api',\n+ title: 'New query engine',",
"comment_created_at": "2025-08-19T01:19:01+00:00",
"comment_author": "lricoy",
"comment_body": "I am intentionally using the name \"New query engine\" even though the flag is \"WEB_ANALYTICS_API\" because this is the only way to opt-in right now, and it won't break things for people using the API. \r\n\r\nThe next PRs will make this simpler.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2211545342",
"pr_number": 34922,
"pr_file": "frontend/src/queries/nodes/DataTable/DataTableExport.tsx",
"created_at": "2025-07-16T20:34:37+00:00",
"commented_code": "import { dataTableLogic, DataTableRow } from './dataTableLogic'\n \n // Sync with posthog/hogql/constants.py\n-export const MAX_SELECT_RETURNED_ROWS = 50000\n+export const MAX_SELECT_RETURNED_ROWS = 300000",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2211545342",
"repo_full_name": "PostHog/posthog",
"pr_number": 34922,
"pr_file": "frontend/src/queries/nodes/DataTable/DataTableExport.tsx",
"discussion_id": "2211545342",
"commented_code": "@@ -26,7 +26,7 @@ import { ExporterFormat } from '~/types'\n import { dataTableLogic, DataTableRow } from './dataTableLogic'\n \n // Sync with posthog/hogql/constants.py\n-export const MAX_SELECT_RETURNED_ROWS = 50000\n+export const MAX_SELECT_RETURNED_ROWS = 300000",
"comment_created_at": "2025-07-16T20:34:37+00:00",
"comment_author": "orian",
"comment_body": "why do you set this to 300K?",
"pr_file_module": null
},
{
"comment_id": "2211575211",
"repo_full_name": "PostHog/posthog",
"pr_number": 34922,
"pr_file": "frontend/src/queries/nodes/DataTable/DataTableExport.tsx",
"discussion_id": "2211545342",
"commented_code": "@@ -26,7 +26,7 @@ import { ExporterFormat } from '~/types'\n import { dataTableLogic, DataTableRow } from './dataTableLogic'\n \n // Sync with posthog/hogql/constants.py\n-export const MAX_SELECT_RETURNED_ROWS = 50000\n+export const MAX_SELECT_RETURNED_ROWS = 300000",
"comment_created_at": "2025-07-16T20:47:18+00:00",
"comment_author": "EDsCODE",
"comment_body": "this is the CSV export limit but the var is using the old name. CSV will export up to 300k. Just changed the naming to reflect",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,27 @@
---
title: Configuration naming clarity
description: Ensure configuration variable names, display labels, and feature flag
names accurately reflect their actual purpose and behavior. Misleading or outdated
names create confusion for both developers and users, making the codebase harder
to maintain and understand.
repository: PostHog/posthog
label: Configurations
language: TSX
comments_count: 2
repository_stars: 28460
---
Ensure configuration variable names, display labels, and feature flag names accurately reflect their actual purpose and behavior. Misleading or outdated names create confusion for both developers and users, making the codebase harder to maintain and understand.
When configuration names don't match their function, it leads to situations where developers must add explanatory comments or use workarounds. For example, avoid using generic names like "New query engine" when the underlying flag is "WEB_ANALYTICS_API", and update variable names when their scope changes (like renaming MAX_SELECT_RETURNED_ROWS when it specifically applies to CSV exports).
Example of good practice:
```typescript
// Good: Name reflects actual purpose
const MAX_CSV_EXPORT_ROWS = 300000
// Bad: Generic name doesn't indicate CSV-specific usage
const MAX_SELECT_RETURNED_ROWS = 300000
```
When introducing new features or changing existing ones, take time to review and update related configuration names to maintain clarity and prevent technical debt.

View File

@@ -0,0 +1,172 @@
[
{
"discussion_id": "2272833477",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"created_at": "2025-08-13T10:14:26+00:00",
"commented_code": "</LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2272833477",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-13T10:14:26+00:00",
"comment_author": "pauldambra",
"comment_body": "i wonder should we _move_ them rather than duplicate them\r\nfolk are clicking these in the figure out what to watch tab\r\nbut i'd actually love to get rid of that tab and fold everything into one experience",
"pr_file_module": null
},
{
"comment_id": "2272915543",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-13T10:41:40+00:00",
"comment_author": "veryayskiy",
"comment_body": "Like move everything inside the Filters tab?\r\n\r\nWhat would happen to the \"Figure out what to watch\" tab? As we also have \"Filter templates\" there and folks check it https://us.posthog.com/project/2/insights/7IYmcsie\r\n\r\n",
"pr_file_module": null
},
{
"comment_id": "2272930465",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-13T10:46:15+00:00",
"comment_author": "pauldambra",
"comment_body": "## yep, i'd move these from fowtw to filters\r\n\r\n<img width=\"1218\" height=\"473\" alt=\"Screenshot 2025-08-13 at 11 43 40\" src=\"https://github.com/user-attachments/assets/f5506eed-77ab-41ff-9c6b-39bfc175b17d\" />\r\n\r\n## i'd leave this there for now\r\n\r\n<img width=\"1413\" height=\"389\" alt=\"Screenshot 2025-08-13 at 11 44 43\" src=\"https://github.com/user-attachments/assets/0d310831-f5d4-4e5a-a5d6-f8edae1dd05a\" />\r\n\r\ni think it's cool, but i don't think people come back to it\r\n\r\n## and i'd leave filter templates\r\n\r\n<img width=\"1408\" height=\"504\" alt=\"Screenshot 2025-08-13 at 11 45 04\" src=\"https://github.com/user-attachments/assets/b0fffc8c-f1d7-4d2e-bb67-8a0f13f2f4b5\" />\r\n\r\n---\r\n\r\nremembering the last time i looked at the data here, i think people are interacting with the email and URL stuff more\r\n\r\n---\r\n",
"pr_file_module": null
},
{
"comment_id": "2275822466",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-14T08:09:20+00:00",
"comment_author": "pauldambra",
"comment_body": "and then we need to figure out the performance issue with the users box here too\r\n\r\nannoyingly they need to move together really or the UI is unbalanced \ud83e\udee0 \r\n",
"pr_file_module": null
},
{
"comment_id": "2276727543",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-14T14:00:19+00:00",
"comment_author": "veryayskiy",
"comment_body": "Good point. I removed them from fwtw to avoid duplication.",
"pr_file_module": null
},
{
"comment_id": "2284802043",
"repo_full_name": "PostHog/posthog",
"pr_number": 36553,
"pr_file": "frontend/src/scenes/session-recordings/filters/RecordingsUniversalFiltersEmbed.tsx",
"discussion_id": "2272833477",
"commented_code": "@@ -487,6 +490,10 @@ export const RecordingsUniversalFiltersEmbed = ({\n </LemonButton>\n </div>\n {SaveFiltersModal()}\n+ <div className=\"flex flex-row gap-2 w-full mt-40\">\n+ <ReplayActiveUsersTable />\n+ <ReplayActiveScreensTable />\n+ </div>",
"comment_created_at": "2025-08-19T10:17:53+00:00",
"comment_author": "veryayskiy",
"comment_body": "@pauldambra I think it's good to go, or do you want more adjustments?",
"pr_file_module": null
}
]
},
{
"discussion_id": "2033004440",
"pr_number": 29364,
"pr_file": "frontend/src/scenes/insights/filters/ActionFilter/ActionFilter.tsx",
"created_at": "2025-04-08T11:39:20+00:00",
"commented_code": ") : null}\n {!singleFilter && (\n <div className=\"ActionFilter-footer\">\n- {!singleFilter && (",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2033004440",
"repo_full_name": "PostHog/posthog",
"pr_number": 29364,
"pr_file": "frontend/src/scenes/insights/filters/ActionFilter/ActionFilter.tsx",
"discussion_id": "2033004440",
"commented_code": "@@ -242,23 +242,21 @@ export const ActionFilter = React.forwardRef<HTMLDivElement, ActionFilterProps>(\n ) : null}\n {!singleFilter && (\n <div className=\"ActionFilter-footer\">\n- {!singleFilter && (",
"comment_created_at": "2025-04-08T11:39:20+00:00",
"comment_author": "thmsobrmlr",
"comment_body": "fly-by change: the `!singleFilter` check is already done a couple of lines above at the wrapping div. As such the check isn't needed again here.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2251871281",
"pr_number": 36129,
"pr_file": "frontend/src/scenes/experiments/ExperimentView/components.tsx",
"created_at": "2025-08-04T15:43:08+00:00",
"commented_code": ">\n Create dashboard\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ if (experiment.feature_flag?.id) {\n+ featureFlagLogic({ id: experiment.feature_flag.id }).mount()",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2251871281",
"repo_full_name": "PostHog/posthog",
"pr_number": 36129,
"pr_file": "frontend/src/scenes/experiments/ExperimentView/components.tsx",
"discussion_id": "2251871281",
"commented_code": "@@ -356,6 +357,21 @@ export function PageHeaderCustom(): JSX.Element {\n >\n Create dashboard\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ if (experiment.feature_flag?.id) {\n+ featureFlagLogic({ id: experiment.feature_flag.id }).mount()",
"comment_created_at": "2025-08-04T15:43:08+00:00",
"comment_author": "marandaneto",
"comment_body": "reusing `featureFlagLogic` instead of duplicating the very same things. DRY",
"pr_file_module": null
}
]
},
{
"discussion_id": "2252270856",
"pr_number": 35986,
"pr_file": "frontend/src/scenes/billing/billingLogic.tsx",
"created_at": "2025-08-04T18:20:30+00:00",
"commented_code": "if (isHidden) {\n return\n }\n+\n+ // Check if this alert was dismissed for the current billing period\n+ const billingPeriodEnd = values.billing?.billing_period?.current_period_end?.format('YYYY-MM-DD')",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2252270856",
"repo_full_name": "PostHog/posthog",
"pr_number": 35986,
"pr_file": "frontend/src/scenes/billing/billingLogic.tsx",
"discussion_id": "2252270856",
"commented_code": "@@ -720,6 +760,29 @@ export const billingLogic = kea<billingLogicType>([\n if (isHidden) {\n return\n }\n+\n+ // Check if this alert was dismissed for the current billing period\n+ const billingPeriodEnd = values.billing?.billing_period?.current_period_end?.format('YYYY-MM-DD')",
"comment_created_at": "2025-08-04T18:20:30+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Nit: this is duplicated exactly except for the dismissKey so it should be moved into a function",
"pr_file_module": null
}
]
},
{
"discussion_id": "2252271207",
"pr_number": 35986,
"pr_file": "frontend/src/scenes/billing/billingLogic.tsx",
"created_at": "2025-08-04T18:20:42+00:00",
"commented_code": "productApproachingLimit.usage_key && productApproachingLimit.usage_key.toLowerCase()\n } allocation.`,\n dismissKey: 'usage-limit-approaching',\n+ onClose: () => {\n+ // Store dismissal in localStorage\n+ const billingPeriodEnd =",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2252271207",
"repo_full_name": "PostHog/posthog",
"pr_number": 35986,
"pr_file": "frontend/src/scenes/billing/billingLogic.tsx",
"discussion_id": "2252271207",
"commented_code": "@@ -729,6 +792,21 @@ export const billingLogic = kea<billingLogicType>([\n productApproachingLimit.usage_key && productApproachingLimit.usage_key.toLowerCase()\n } allocation.`,\n dismissKey: 'usage-limit-approaching',\n+ onClose: () => {\n+ // Store dismissal in localStorage\n+ const billingPeriodEnd =",
"comment_created_at": "2025-08-04T18:20:42+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Nit: this is duplicated exactly except for the dismissKey so it should be moved into a function",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,54 @@
---
title: eliminate code duplication
description: Actively identify and eliminate code duplication in all its forms to
improve maintainability and reduce bugs. This includes removing redundant conditional
checks, extracting common functionality into reusable functions, and avoiding component
or logic duplication.
repository: PostHog/posthog
label: Code Style
language: TSX
comments_count: 5
repository_stars: 28460
---
Actively identify and eliminate code duplication in all its forms to improve maintainability and reduce bugs. This includes removing redundant conditional checks, extracting common functionality into reusable functions, and avoiding component or logic duplication.
Common patterns to watch for:
- **Redundant conditionals**: Remove duplicate checks that are already handled by parent conditions
- **Duplicated logic blocks**: Extract identical code (except for small variations like keys) into shared functions
- **Component duplication**: Move or reuse existing components instead of creating duplicates
- **Logic reuse**: Leverage existing logic instances rather than reimplementing the same functionality
Example of redundant conditional elimination:
```tsx
// Before: redundant check
{!singleFilter && (
<div className="ActionFilter-footer">
{!singleFilter && ( // ❌ Already checked above
<SomeComponent />
)}
</div>
)}
// After: clean structure
{!singleFilter && (
<div className="ActionFilter-footer">
<SomeComponent /> // ✅ No redundant check
</div>
)}
```
Example of extracting duplicated logic:
```tsx
// Before: duplicated except for dismissKey
const billingPeriodEnd = values.billing?.billing_period?.current_period_end?.format('YYYY-MM-DD')
// ... identical logic with different dismissKey
// After: extract to function
const handleDismissal = (dismissKey: string) => {
const billingPeriodEnd = values.billing?.billing_period?.current_period_end?.format('YYYY-MM-DD')
// ... shared logic
}
```
Always prefer reusing existing implementations over creating new ones, and extract common patterns into shared utilities when the same logic appears multiple times.

View File

@@ -0,0 +1,94 @@
[
{
"discussion_id": "2273739168",
"pr_number": 36519,
"pr_file": "posthog/tasks/email.py",
"created_at": "2025-08-13T14:57:47+00:00",
"commented_code": "message.send()\n \n \n+@shared_task(**EMAIL_TASK_KWARGS)\n+def login_from_new_device_notification(\n+ user_id: int, login_time: datetime, short_user_agent: str, ip_address: str\n+) -> None:\n+ \"\"\"Send login notification email if login is from a new device\"\"\"\n+ if not is_email_available(with_absolute_urls=True):\n+ return\n+\n+ user: User = User.objects.get(pk=user_id)\n+\n+ # Send email if feature flag is enabled or in tests\n+ if settings.TEST:\n+ enabled = True\n+ elif user.current_organization is None:\n+ enabled = False\n+ else:\n+ enabled = posthoganalytics.feature_enabled(\n+ key=\"login-from-new-device-notification\",\n+ distinct_id=str(user.distinct_id),\n+ groups={\"organization\": str(user.current_organization.id)},\n+ )\n+\n+ if not enabled:\n+ return\n+\n+ is_new_device = check_and_cache_login_device(user_id, ip_address, short_user_agent)\n+ if not is_new_device:\n+ return\n+\n+ login_time_str = login_time.strftime(\"%B %-d, %Y at %H:%M UTC\")\n+ geoip_data = get_geoip_properties(ip_address)\n+\n+ # Compose location as \"City, Country\" (omit city if missing)\n+ location = \", \".join(\n+ part\n+ for part in [geoip_data.get(\"$geoip_city_name\", \"\"), geoip_data.get(\"$geoip_country_name\", \"Unknown\")]\n+ if part\n+ )\n+\n+ message = EmailMessage(\n+ use_http=True,\n+ campaign_key=f\"login_notification_{user.uuid}-{timezone.now().timestamp()}\",\n+ template_name=\"login_notification\",\n+ subject=\"A new device logged into your account\",\n+ template_context={\n+ \"login_time\": login_time_str,\n+ \"ip_address\": ip_address,\n+ \"location\": location,\n+ \"browser\": short_user_agent,\n+ },\n+ )\n+ message.add_recipient(user.email)\n+ message.send()\n+ report_user_action(user=user, event=\"login notification sent\")",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2273739168",
"repo_full_name": "PostHog/posthog",
"pr_number": 36519,
"pr_file": "posthog/tasks/email.py",
"discussion_id": "2273739168",
"commented_code": "@@ -453,6 +456,62 @@ def send_two_factor_auth_backup_code_used_email(user_id: int) -> None:\n message.send()\n \n \n+@shared_task(**EMAIL_TASK_KWARGS)\n+def login_from_new_device_notification(\n+ user_id: int, login_time: datetime, short_user_agent: str, ip_address: str\n+) -> None:\n+ \"\"\"Send login notification email if login is from a new device\"\"\"\n+ if not is_email_available(with_absolute_urls=True):\n+ return\n+\n+ user: User = User.objects.get(pk=user_id)\n+\n+ # Send email if feature flag is enabled or in tests\n+ if settings.TEST:\n+ enabled = True\n+ elif user.current_organization is None:\n+ enabled = False\n+ else:\n+ enabled = posthoganalytics.feature_enabled(\n+ key=\"login-from-new-device-notification\",\n+ distinct_id=str(user.distinct_id),\n+ groups={\"organization\": str(user.current_organization.id)},\n+ )\n+\n+ if not enabled:\n+ return\n+\n+ is_new_device = check_and_cache_login_device(user_id, ip_address, short_user_agent)\n+ if not is_new_device:\n+ return\n+\n+ login_time_str = login_time.strftime(\"%B %-d, %Y at %H:%M UTC\")\n+ geoip_data = get_geoip_properties(ip_address)\n+\n+ # Compose location as \"City, Country\" (omit city if missing)\n+ location = \", \".join(\n+ part\n+ for part in [geoip_data.get(\"$geoip_city_name\", \"\"), geoip_data.get(\"$geoip_country_name\", \"Unknown\")]\n+ if part\n+ )\n+\n+ message = EmailMessage(\n+ use_http=True,\n+ campaign_key=f\"login_notification_{user.uuid}-{timezone.now().timestamp()}\",\n+ template_name=\"login_notification\",\n+ subject=\"A new device logged into your account\",\n+ template_context={\n+ \"login_time\": login_time_str,\n+ \"ip_address\": ip_address,\n+ \"location\": location,\n+ \"browser\": short_user_agent,\n+ },\n+ )\n+ message.add_recipient(user.email)\n+ message.send()\n+ report_user_action(user=user, event=\"login notification sent\")",
"comment_created_at": "2025-08-13T14:57:47+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Can we add some properties here for debugging this?",
"pr_file_module": null
},
{
"comment_id": "2273739958",
"repo_full_name": "PostHog/posthog",
"pr_number": 36519,
"pr_file": "posthog/tasks/email.py",
"discussion_id": "2273739168",
"commented_code": "@@ -453,6 +456,62 @@ def send_two_factor_auth_backup_code_used_email(user_id: int) -> None:\n message.send()\n \n \n+@shared_task(**EMAIL_TASK_KWARGS)\n+def login_from_new_device_notification(\n+ user_id: int, login_time: datetime, short_user_agent: str, ip_address: str\n+) -> None:\n+ \"\"\"Send login notification email if login is from a new device\"\"\"\n+ if not is_email_available(with_absolute_urls=True):\n+ return\n+\n+ user: User = User.objects.get(pk=user_id)\n+\n+ # Send email if feature flag is enabled or in tests\n+ if settings.TEST:\n+ enabled = True\n+ elif user.current_organization is None:\n+ enabled = False\n+ else:\n+ enabled = posthoganalytics.feature_enabled(\n+ key=\"login-from-new-device-notification\",\n+ distinct_id=str(user.distinct_id),\n+ groups={\"organization\": str(user.current_organization.id)},\n+ )\n+\n+ if not enabled:\n+ return\n+\n+ is_new_device = check_and_cache_login_device(user_id, ip_address, short_user_agent)\n+ if not is_new_device:\n+ return\n+\n+ login_time_str = login_time.strftime(\"%B %-d, %Y at %H:%M UTC\")\n+ geoip_data = get_geoip_properties(ip_address)\n+\n+ # Compose location as \"City, Country\" (omit city if missing)\n+ location = \", \".join(\n+ part\n+ for part in [geoip_data.get(\"$geoip_city_name\", \"\"), geoip_data.get(\"$geoip_country_name\", \"Unknown\")]\n+ if part\n+ )\n+\n+ message = EmailMessage(\n+ use_http=True,\n+ campaign_key=f\"login_notification_{user.uuid}-{timezone.now().timestamp()}\",\n+ template_name=\"login_notification\",\n+ subject=\"A new device logged into your account\",\n+ template_context={\n+ \"login_time\": login_time_str,\n+ \"ip_address\": ip_address,\n+ \"location\": location,\n+ \"browser\": short_user_agent,\n+ },\n+ )\n+ message.add_recipient(user.email)\n+ message.send()\n+ report_user_action(user=user, event=\"login notification sent\")",
"comment_created_at": "2025-08-13T14:58:03+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Like IP and User Agent and Geo",
"pr_file_module": null
},
{
"comment_id": "2279502001",
"repo_full_name": "PostHog/posthog",
"pr_number": 36519,
"pr_file": "posthog/tasks/email.py",
"discussion_id": "2273739168",
"commented_code": "@@ -453,6 +456,62 @@ def send_two_factor_auth_backup_code_used_email(user_id: int) -> None:\n message.send()\n \n \n+@shared_task(**EMAIL_TASK_KWARGS)\n+def login_from_new_device_notification(\n+ user_id: int, login_time: datetime, short_user_agent: str, ip_address: str\n+) -> None:\n+ \"\"\"Send login notification email if login is from a new device\"\"\"\n+ if not is_email_available(with_absolute_urls=True):\n+ return\n+\n+ user: User = User.objects.get(pk=user_id)\n+\n+ # Send email if feature flag is enabled or in tests\n+ if settings.TEST:\n+ enabled = True\n+ elif user.current_organization is None:\n+ enabled = False\n+ else:\n+ enabled = posthoganalytics.feature_enabled(\n+ key=\"login-from-new-device-notification\",\n+ distinct_id=str(user.distinct_id),\n+ groups={\"organization\": str(user.current_organization.id)},\n+ )\n+\n+ if not enabled:\n+ return\n+\n+ is_new_device = check_and_cache_login_device(user_id, ip_address, short_user_agent)\n+ if not is_new_device:\n+ return\n+\n+ login_time_str = login_time.strftime(\"%B %-d, %Y at %H:%M UTC\")\n+ geoip_data = get_geoip_properties(ip_address)\n+\n+ # Compose location as \"City, Country\" (omit city if missing)\n+ location = \", \".join(\n+ part\n+ for part in [geoip_data.get(\"$geoip_city_name\", \"\"), geoip_data.get(\"$geoip_country_name\", \"Unknown\")]\n+ if part\n+ )\n+\n+ message = EmailMessage(\n+ use_http=True,\n+ campaign_key=f\"login_notification_{user.uuid}-{timezone.now().timestamp()}\",\n+ template_name=\"login_notification\",\n+ subject=\"A new device logged into your account\",\n+ template_context={\n+ \"login_time\": login_time_str,\n+ \"ip_address\": ip_address,\n+ \"location\": location,\n+ \"browser\": short_user_agent,\n+ },\n+ )\n+ message.add_recipient(user.email)\n+ message.send()\n+ report_user_action(user=user, event=\"login notification sent\")",
"comment_created_at": "2025-08-15T17:01:37+00:00",
"comment_author": "a-lider",
"comment_body": "Added this",
"pr_file_module": null
},
{
"comment_id": "2279507571",
"repo_full_name": "PostHog/posthog",
"pr_number": 36519,
"pr_file": "posthog/tasks/email.py",
"discussion_id": "2273739168",
"commented_code": "@@ -453,6 +456,62 @@ def send_two_factor_auth_backup_code_used_email(user_id: int) -> None:\n message.send()\n \n \n+@shared_task(**EMAIL_TASK_KWARGS)\n+def login_from_new_device_notification(\n+ user_id: int, login_time: datetime, short_user_agent: str, ip_address: str\n+) -> None:\n+ \"\"\"Send login notification email if login is from a new device\"\"\"\n+ if not is_email_available(with_absolute_urls=True):\n+ return\n+\n+ user: User = User.objects.get(pk=user_id)\n+\n+ # Send email if feature flag is enabled or in tests\n+ if settings.TEST:\n+ enabled = True\n+ elif user.current_organization is None:\n+ enabled = False\n+ else:\n+ enabled = posthoganalytics.feature_enabled(\n+ key=\"login-from-new-device-notification\",\n+ distinct_id=str(user.distinct_id),\n+ groups={\"organization\": str(user.current_organization.id)},\n+ )\n+\n+ if not enabled:\n+ return\n+\n+ is_new_device = check_and_cache_login_device(user_id, ip_address, short_user_agent)\n+ if not is_new_device:\n+ return\n+\n+ login_time_str = login_time.strftime(\"%B %-d, %Y at %H:%M UTC\")\n+ geoip_data = get_geoip_properties(ip_address)\n+\n+ # Compose location as \"City, Country\" (omit city if missing)\n+ location = \", \".join(\n+ part\n+ for part in [geoip_data.get(\"$geoip_city_name\", \"\"), geoip_data.get(\"$geoip_country_name\", \"Unknown\")]\n+ if part\n+ )\n+\n+ message = EmailMessage(\n+ use_http=True,\n+ campaign_key=f\"login_notification_{user.uuid}-{timezone.now().timestamp()}\",\n+ template_name=\"login_notification\",\n+ subject=\"A new device logged into your account\",\n+ template_context={\n+ \"login_time\": login_time_str,\n+ \"ip_address\": ip_address,\n+ \"location\": location,\n+ \"browser\": short_user_agent,\n+ },\n+ )\n+ message.add_recipient(user.email)\n+ message.send()\n+ report_user_action(user=user, event=\"login notification sent\")",
"comment_created_at": "2025-08-15T17:03:38+00:00",
"comment_author": "a-lider",
"comment_body": "Also changed `report_user_action` to `ph_client.capture`\r\nIn v1, events weren\u2019t captured, Yasen suggested this might be because capture has its own queue and events don\u2019t always get sent before the Celery task finishes. ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2269054010",
"pr_number": 36472,
"pr_file": "posthog/api/wizard/http.py",
"created_at": "2025-08-12T08:10:11+00:00",
"commented_code": ")\n \n project_api_token = project.passthrough_team.api_token\n- except Project.DoesNotExist:\n- raise serializers.ValidationError({\"projectId\": [\"This project does not exist.\"]}, code=\"not_found\")\n+ except Project.DoesNotExist as e:\n+ capture_exception(\n+ e,\n+ {\n+ \"project_id\": project_id,\n+ \"user_id\": request.user.id if request.user else None,\n+ \"user_distinct_id\": request.user.distinct_id if request.user else None,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2269054010",
"repo_full_name": "PostHog/posthog",
"pr_number": 36472,
"pr_file": "posthog/api/wizard/http.py",
"discussion_id": "2269054010",
"commented_code": "@@ -295,8 +333,16 @@ def authenticate(self, request, **kwargs):\n )\n \n project_api_token = project.passthrough_team.api_token\n- except Project.DoesNotExist:\n- raise serializers.ValidationError({\"projectId\": [\"This project does not exist.\"]}, code=\"not_found\")\n+ except Project.DoesNotExist as e:\n+ capture_exception(\n+ e,\n+ {\n+ \"project_id\": project_id,\n+ \"user_id\": request.user.id if request.user else None,\n+ \"user_distinct_id\": request.user.distinct_id if request.user else None,",
"comment_created_at": "2025-08-12T08:10:11+00:00",
"comment_author": "JonathanLab",
"comment_body": "```suggestion\n \"user_distinct_id\": request.user.distinct_id if request.user else None,\n \"ai_product\": \"wizard\"\n```\n\nSuggest we maybe also set `\"team\": \"growth\"` on all of these exceptions? Makes it easier for us to auto-assign these to us in our error tracker",
"pr_file_module": null
},
{
"comment_id": "2270188107",
"repo_full_name": "PostHog/posthog",
"pr_number": 36472,
"pr_file": "posthog/api/wizard/http.py",
"discussion_id": "2269054010",
"commented_code": "@@ -295,8 +333,16 @@ def authenticate(self, request, **kwargs):\n )\n \n project_api_token = project.passthrough_team.api_token\n- except Project.DoesNotExist:\n- raise serializers.ValidationError({\"projectId\": [\"This project does not exist.\"]}, code=\"not_found\")\n+ except Project.DoesNotExist as e:\n+ capture_exception(\n+ e,\n+ {\n+ \"project_id\": project_id,\n+ \"user_id\": request.user.id if request.user else None,\n+ \"user_distinct_id\": request.user.distinct_id if request.user else None,",
"comment_created_at": "2025-08-12T15:01:59+00:00",
"comment_author": "daniloc",
"comment_body": "oh nice",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,41 @@
---
title: Enrich telemetry context
description: Always include relevant contextual metadata when capturing telemetry
data (events, exceptions, logs, metrics) to improve debugging and operational visibility.
This includes user information, team ownership, product areas, IP addresses, user
agents, geographic data, and other relevant context that helps with troubleshooting
and error assignment.
repository: PostHog/posthog
label: Observability
language: Python
comments_count: 2
repository_stars: 28460
---
Always include relevant contextual metadata when capturing telemetry data (events, exceptions, logs, metrics) to improve debugging and operational visibility. This includes user information, team ownership, product areas, IP addresses, user agents, geographic data, and other relevant context that helps with troubleshooting and error assignment.
When capturing exceptions, include contextual information like:
```python
capture_exception(
e,
{
"project_id": project_id,
"user_id": request.user.id if request.user else None,
"user_distinct_id": request.user.distinct_id if request.user else None,
"ai_product": "wizard",
"team": "growth" # Makes it easier for auto-assignment in error tracker
}
)
```
When logging events or actions, include debugging properties:
```python
report_user_action(user=user, event="login notification sent", properties={
"ip_address": ip_address,
"user_agent": short_user_agent,
"location": location,
"login_time": login_time_str
})
```
Rich contextual data transforms raw telemetry into actionable insights, enabling faster debugging, better error routing, and more effective monitoring.

View File

@@ -0,0 +1,82 @@
[
{
"discussion_id": "2250946612",
"pr_number": 36086,
"pr_file": "plugin-server/src/cdp/consumers/cdp-behavioural-events.consumer.ts",
"created_at": "2025-08-04T09:35:56+00:00",
"commented_code": "protocolOptions: {\n port: hub.CASSANDRA_PORT,\n },\n+ sslOptions: isCloud()\n+ ? {\n+ ca: fs.readFileSync(join(__dirname, '../cassandra/ca.crt')),",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2250946612",
"repo_full_name": "PostHog/posthog",
"pr_number": 36086,
"pr_file": "plugin-server/src/cdp/consumers/cdp-behavioural-events.consumer.ts",
"discussion_id": "2250946612",
"commented_code": "@@ -84,12 +87,21 @@ export class CdpBehaviouralEventsConsumer extends CdpConsumerBase {\n protocolOptions: {\n port: hub.CASSANDRA_PORT,\n },\n+ sslOptions: isCloud()\n+ ? {\n+ ca: fs.readFileSync(join(__dirname, '../cassandra/ca.crt')),",
"comment_created_at": "2025-08-04T09:35:56+00:00",
"comment_author": "benjackwhite",
"comment_body": "Is it possible to this via a base64 encoded env? Mounting files is a headache",
"pr_file_module": null
},
{
"comment_id": "2250947762",
"repo_full_name": "PostHog/posthog",
"pr_number": 36086,
"pr_file": "plugin-server/src/cdp/consumers/cdp-behavioural-events.consumer.ts",
"discussion_id": "2250946612",
"commented_code": "@@ -84,12 +87,21 @@ export class CdpBehaviouralEventsConsumer extends CdpConsumerBase {\n protocolOptions: {\n port: hub.CASSANDRA_PORT,\n },\n+ sslOptions: isCloud()\n+ ? {\n+ ca: fs.readFileSync(join(__dirname, '../cassandra/ca.crt')),",
"comment_created_at": "2025-08-04T09:36:24+00:00",
"comment_author": "benjackwhite",
"comment_body": "also much prefer not relying on \"isCloud\" but rather purely relying on config envs. Reason being that we might have something different in dev to prod",
"pr_file_module": null
},
{
"comment_id": "2251219210",
"repo_full_name": "PostHog/posthog",
"pr_number": 36086,
"pr_file": "plugin-server/src/cdp/consumers/cdp-behavioural-events.consumer.ts",
"discussion_id": "2250946612",
"commented_code": "@@ -84,12 +87,21 @@ export class CdpBehaviouralEventsConsumer extends CdpConsumerBase {\n protocolOptions: {\n port: hub.CASSANDRA_PORT,\n },\n+ sslOptions: isCloud()\n+ ? {\n+ ca: fs.readFileSync(join(__dirname, '../cassandra/ca.crt')),",
"comment_created_at": "2025-08-04T11:37:25+00:00",
"comment_author": "meikelmosby",
"comment_body": "@benjackwhite i asked the same but Michis answer was\r\n\r\n```\r\nas we don't issue those, I think it could make more sense to download it on startup\r\nmeans: if they change it, we always get the latest version\r\neither we just put it in the command section of the container or we have an init container downloading it\r\nrecommended is the latter\r\n```\r\n\r\nyeah can make the change with not relying on isCloud ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2247538006",
"pr_number": 35926,
"pr_file": "plugin-server/src/config/config.ts",
"created_at": "2025-08-01T10:02:26+00:00",
"commented_code": "PERSON_BATCH_WRITING_MAX_OPTIMISTIC_UPDATE_RETRIES: 5,\n PERSON_BATCH_WRITING_OPTIMISTIC_UPDATE_RETRY_INTERVAL_MS: 50,\n PERSON_UPDATE_CALCULATE_PROPERTIES_SIZE: 0,\n+ PERSON_PROPERTIES_SIZE_LIMIT: 1024 * 1024, // 1MB default",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2247538006",
"repo_full_name": "PostHog/posthog",
"pr_number": 35926,
"pr_file": "plugin-server/src/config/config.ts",
"discussion_id": "2247538006",
"commented_code": "@@ -265,6 +265,7 @@ export function getDefaultConfig(): PluginsServerConfig {\n PERSON_BATCH_WRITING_MAX_OPTIMISTIC_UPDATE_RETRIES: 5,\n PERSON_BATCH_WRITING_OPTIMISTIC_UPDATE_RETRY_INTERVAL_MS: 50,\n PERSON_UPDATE_CALCULATE_PROPERTIES_SIZE: 0,\n+ PERSON_PROPERTIES_SIZE_LIMIT: 1024 * 1024, // 1MB default",
"comment_created_at": "2025-08-01T10:02:26+00:00",
"comment_author": "pl",
"comment_body": "nit: I was thinking maybe we should set it to 512KB default - we publish those updates to Kafka and I think the limit is 1MB, so it would be good to set the default to a value that won't breach it.",
"pr_file_module": null
},
{
"comment_id": "2258173746",
"repo_full_name": "PostHog/posthog",
"pr_number": 35926,
"pr_file": "plugin-server/src/config/config.ts",
"discussion_id": "2247538006",
"commented_code": "@@ -265,6 +265,7 @@ export function getDefaultConfig(): PluginsServerConfig {\n PERSON_BATCH_WRITING_MAX_OPTIMISTIC_UPDATE_RETRIES: 5,\n PERSON_BATCH_WRITING_OPTIMISTIC_UPDATE_RETRY_INTERVAL_MS: 50,\n PERSON_UPDATE_CALCULATE_PROPERTIES_SIZE: 0,\n+ PERSON_PROPERTIES_SIZE_LIMIT: 1024 * 1024, // 1MB default",
"comment_created_at": "2025-08-06T20:08:51+00:00",
"comment_author": "nickbest-ph",
"comment_body": "yep, that soudns good to me.",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,41 @@
---
title: Environment-based configuration management
description: Prefer environment variables over file mounting for configuration values,
and avoid hardcoded environment-specific conditionals like `isCloud()`. Instead,
rely purely on configuration environment variables that can be set differently across
environments. When setting default configuration values, consider the constraints
and limits of downstream systems to...
repository: PostHog/posthog
label: Configurations
language: TypeScript
comments_count: 2
repository_stars: 28460
---
Prefer environment variables over file mounting for configuration values, and avoid hardcoded environment-specific conditionals like `isCloud()`. Instead, rely purely on configuration environment variables that can be set differently across environments. When setting default configuration values, consider the constraints and limits of downstream systems to prevent runtime failures.
For example, instead of:
```typescript
sslOptions: isCloud()
? {
ca: fs.readFileSync(join(__dirname, '../cassandra/ca.crt')),
}
: undefined
```
Use environment variables:
```typescript
sslOptions: process.env.CASSANDRA_SSL_CA
? {
ca: Buffer.from(process.env.CASSANDRA_SSL_CA, 'base64'),
}
: undefined
```
And when setting defaults, consider system limits:
```typescript
// Consider Kafka's 1MB limit when setting defaults
PERSON_PROPERTIES_SIZE_LIMIT: 512 * 1024, // 512KB default (safe margin under 1MB Kafka limit)
```
This approach makes configuration more flexible, testable, and avoids deployment complexities like file mounting while ensuring defaults don't cause system failures.

View File

@@ -0,0 +1,70 @@
[
{
"discussion_id": "2219466081",
"pr_number": 35030,
"pr_file": "frontend/src/scenes/funnels/FunnelBarHorizontal/FunnelBarHorizontal.scss",
"created_at": "2025-07-21T14:54:22+00:00",
"commented_code": "padding-left: 1rem;\n }\n \n+ // When there are optional steps, indent all content to make room for the layout\n+ &.has-optional-steps {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2219466081",
"repo_full_name": "PostHog/posthog",
"pr_number": 35030,
"pr_file": "frontend/src/scenes/funnels/FunnelBarHorizontal/FunnelBarHorizontal.scss",
"discussion_id": "2219466081",
"commented_code": "@@ -13,6 +13,13 @@ $glyph_height: 23px; // Based on .funnel-step-glyph\n padding-left: 1rem;\n }\n \n+ // When there are optional steps, indent all content to make room for the layout\n+ &.has-optional-steps {",
"comment_created_at": "2025-07-21T14:54:22+00:00",
"comment_author": "thmsobrmlr",
"comment_body": "nit: we mostly use BEM-style class names, so `&.FunnelBarHorizontal--has-optional-steps` would fit that pattern better",
"pr_file_module": null
}
]
},
{
"discussion_id": "2253520336",
"pr_number": 36116,
"pr_file": "products/error_tracking/frontend/ErrorTracking.scss",
"created_at": "2025-08-05T08:16:38+00:00",
"commented_code": "background-color: inherit;\n }\n }\n+\n+body[theme='light'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-50);\n+ --gray-2: var(--primitive-neutral-100);\n+ --gray-3: var(--primitive-neutral-150);\n+}\n+\n+body[theme='dark'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-cool-900);\n+ --gray-2: var(--primitive-neutral-cool-850);\n+ --gray-3: var(--primitive-neutral-cool-800);\n+}",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2253520336",
"repo_full_name": "PostHog/posthog",
"pr_number": 36116,
"pr_file": "products/error_tracking/frontend/ErrorTracking.scss",
"discussion_id": "2253520336",
"commented_code": "@@ -7,3 +7,15 @@\n background-color: inherit;\n }\n }\n+\n+body[theme='light'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-50);\n+ --gray-2: var(--primitive-neutral-100);\n+ --gray-3: var(--primitive-neutral-150);\n+}\n+\n+body[theme='dark'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-cool-900);\n+ --gray-2: var(--primitive-neutral-cool-850);\n+ --gray-3: var(--primitive-neutral-cool-800);\n+}",
"comment_created_at": "2025-08-05T08:16:38+00:00",
"comment_author": "daibhin",
"comment_body": "What's the thinking behind these custom colours? And specifically renaming them?",
"pr_file_module": null
},
{
"comment_id": "2254061538",
"repo_full_name": "PostHog/posthog",
"pr_number": 36116,
"pr_file": "products/error_tracking/frontend/ErrorTracking.scss",
"discussion_id": "2253520336",
"commented_code": "@@ -7,3 +7,15 @@\n background-color: inherit;\n }\n }\n+\n+body[theme='light'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-50);\n+ --gray-2: var(--primitive-neutral-100);\n+ --gray-3: var(--primitive-neutral-150);\n+}\n+\n+body[theme='dark'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-cool-900);\n+ --gray-2: var(--primitive-neutral-cool-850);\n+ --gray-3: var(--primitive-neutral-cool-800);\n+}",
"comment_created_at": "2025-08-05T11:30:16+00:00",
"comment_author": "hpouillot",
"comment_body": "The goal is to not worry about the theme when using those colors (as they will switch automatically) and have convenient shortcuts for tailwind. Naming is inspired by [radix colors](https://www.radix-ui.com/colors).",
"pr_file_module": null
},
{
"comment_id": "2254877711",
"repo_full_name": "PostHog/posthog",
"pr_number": 36116,
"pr_file": "products/error_tracking/frontend/ErrorTracking.scss",
"discussion_id": "2253520336",
"commented_code": "@@ -7,3 +7,15 @@\n background-color: inherit;\n }\n }\n+\n+body[theme='light'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-50);\n+ --gray-2: var(--primitive-neutral-100);\n+ --gray-3: var(--primitive-neutral-150);\n+}\n+\n+body[theme='dark'] .ErrorTrackingIssue {\n+ --gray-1: var(--primitive-neutral-cool-900);\n+ --gray-2: var(--primitive-neutral-cool-850);\n+ --gray-3: var(--primitive-neutral-cool-800);\n+}",
"comment_created_at": "2025-08-05T17:04:19+00:00",
"comment_author": "daibhin",
"comment_body": "Fair enough. I have little context on where we're at with colour but @adamleithp might have some suggestions about how we want to implement this kind of thing",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,20 @@
---
title: Follow CSS naming patterns
description: Maintain consistency with established CSS naming conventions already
used in the codebase. For CSS classes, follow BEM methodology when that's the existing
pattern. For custom properties, use semantic names that clearly indicate their purpose
and work across themes.
repository: PostHog/posthog
label: Naming Conventions
language: Css
comments_count: 2
repository_stars: 28460
---
Maintain consistency with established CSS naming conventions already used in the codebase. For CSS classes, follow BEM methodology when that's the existing pattern. For custom properties, use semantic names that clearly indicate their purpose and work across themes.
Examples:
- Use BEM-style class names: `&.FunnelBarHorizontal--has-optional-steps` instead of `&.has-optional-steps`
- Use semantic CSS custom properties: `--gray-1`, `--gray-2` for theme-agnostic color variables that automatically adapt to light/dark themes
This ensures code maintainability and helps other developers quickly understand the naming system in use.

View File

@@ -0,0 +1,70 @@
[
{
"discussion_id": "2250796145",
"pr_number": 35753,
"pr_file": "frontend/src/lib/components/TaxonomicFilter/taxonomicFilterLogic.tsx",
"created_at": "2025-08-04T08:33:31+00:00",
"commented_code": "() => [(_, props) => props.allowNonCapturedEvents],\n (allowNonCapturedEvents: boolean | undefined) => allowNonCapturedEvents ?? false,\n ],\n+ enablePreaggregatedTableHints: [\n+ () => [(_, props) => props.enablePreaggregatedTableHints],\n+ (enablePreaggregatedTableHints) => !!enablePreaggregatedTableHints,\n+ ],\n taxonomicGroups: [\n- (s) => [\n- s.currentTeam,\n- s.currentProjectId,\n- s.groupAnalyticsTaxonomicGroups,\n- s.groupAnalyticsTaxonomicGroupNames,\n- s.eventNames,\n- s.schemaColumns,\n- s.metadataSource,\n- s.propertyFilters,\n- s.eventMetadataPropertyDefinitions,\n- s.eventOrdering,\n- s.maxContextOptions,\n- ],\n+ (s) =>\n+ [\n+ s.currentTeam,\n+ s.currentProjectId,\n+ s.groupAnalyticsTaxonomicGroups,\n+ s.groupAnalyticsTaxonomicGroupNames,\n+ s.eventNames,\n+ s.schemaColumns,\n+ s.metadataSource,\n+ s.propertyFilters,\n+ s.eventMetadataPropertyDefinitions,\n+ s.eventOrdering,\n+ s.maxContextOptions,\n+ s.enablePreaggregatedTableHints,\n+ ] as any, // workaround as Kea's SelectorTuple has a limit of 11 items: https://github.com/keajs/kea/blob/v3.1.5/src/types.ts#L162-L174",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2250796145",
"repo_full_name": "PostHog/posthog",
"pr_number": 35753,
"pr_file": "frontend/src/lib/components/TaxonomicFilter/taxonomicFilterLogic.tsx",
"discussion_id": "2250796145",
"commented_code": "@@ -207,20 +208,26 @@ export const taxonomicFilterLogic = kea<taxonomicFilterLogicType>([\n () => [(_, props) => props.allowNonCapturedEvents],\n (allowNonCapturedEvents: boolean | undefined) => allowNonCapturedEvents ?? false,\n ],\n+ enablePreaggregatedTableHints: [\n+ () => [(_, props) => props.enablePreaggregatedTableHints],\n+ (enablePreaggregatedTableHints) => !!enablePreaggregatedTableHints,\n+ ],\n taxonomicGroups: [\n- (s) => [\n- s.currentTeam,\n- s.currentProjectId,\n- s.groupAnalyticsTaxonomicGroups,\n- s.groupAnalyticsTaxonomicGroupNames,\n- s.eventNames,\n- s.schemaColumns,\n- s.metadataSource,\n- s.propertyFilters,\n- s.eventMetadataPropertyDefinitions,\n- s.eventOrdering,\n- s.maxContextOptions,\n- ],\n+ (s) =>\n+ [\n+ s.currentTeam,\n+ s.currentProjectId,\n+ s.groupAnalyticsTaxonomicGroups,\n+ s.groupAnalyticsTaxonomicGroupNames,\n+ s.eventNames,\n+ s.schemaColumns,\n+ s.metadataSource,\n+ s.propertyFilters,\n+ s.eventMetadataPropertyDefinitions,\n+ s.eventOrdering,\n+ s.maxContextOptions,\n+ s.enablePreaggregatedTableHints,\n+ ] as any, // workaround as Kea's SelectorTuple has a limit of 11 items: https://github.com/keajs/kea/blob/v3.1.5/src/types.ts#L162-L174",
"comment_created_at": "2025-08-04T08:33:31+00:00",
"comment_author": "robbie-c",
"comment_body": "I've worked around these before by grouping them up and making another selector just to get a group of props\r\n\r\ne.g. you could add a selector that returns {currentTeam, currentProjectId} and then taxonomicGroups can use that selector.",
"pr_file_module": null
},
{
"comment_id": "2250881834",
"repo_full_name": "PostHog/posthog",
"pr_number": 35753,
"pr_file": "frontend/src/lib/components/TaxonomicFilter/taxonomicFilterLogic.tsx",
"discussion_id": "2250796145",
"commented_code": "@@ -207,20 +208,26 @@ export const taxonomicFilterLogic = kea<taxonomicFilterLogicType>([\n () => [(_, props) => props.allowNonCapturedEvents],\n (allowNonCapturedEvents: boolean | undefined) => allowNonCapturedEvents ?? false,\n ],\n+ enablePreaggregatedTableHints: [\n+ () => [(_, props) => props.enablePreaggregatedTableHints],\n+ (enablePreaggregatedTableHints) => !!enablePreaggregatedTableHints,\n+ ],\n taxonomicGroups: [\n- (s) => [\n- s.currentTeam,\n- s.currentProjectId,\n- s.groupAnalyticsTaxonomicGroups,\n- s.groupAnalyticsTaxonomicGroupNames,\n- s.eventNames,\n- s.schemaColumns,\n- s.metadataSource,\n- s.propertyFilters,\n- s.eventMetadataPropertyDefinitions,\n- s.eventOrdering,\n- s.maxContextOptions,\n- ],\n+ (s) =>\n+ [\n+ s.currentTeam,\n+ s.currentProjectId,\n+ s.groupAnalyticsTaxonomicGroups,\n+ s.groupAnalyticsTaxonomicGroupNames,\n+ s.eventNames,\n+ s.schemaColumns,\n+ s.metadataSource,\n+ s.propertyFilters,\n+ s.eventMetadataPropertyDefinitions,\n+ s.eventOrdering,\n+ s.maxContextOptions,\n+ s.enablePreaggregatedTableHints,\n+ ] as any, // workaround as Kea's SelectorTuple has a limit of 11 items: https://github.com/keajs/kea/blob/v3.1.5/src/types.ts#L162-L174",
"comment_created_at": "2025-08-04T09:09:02+00:00",
"comment_author": "lricoy",
"comment_body": "This is a nice option. Rafa and Marius updated Kea, and now we can go up to 16 \ud83c\udf89 ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2270824949",
"pr_number": 36474,
"pr_file": "frontend/src/scenes/data-warehouse/DataWarehouseScene.tsx",
"created_at": "2025-08-12T18:48:36+00:00",
"commented_code": "+import { useMemo, useState, useEffect } from 'react'\n import { SceneExport } from 'scenes/sceneTypes'\n+import { PipelineTab } from '~/types'\n import { FEATURE_FLAGS } from 'lib/constants'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n import { useValues } from 'kea'\n import { NotFound } from 'lib/components/NotFound'\n+import { urls } from 'scenes/urls'\n+import { LemonButton, LemonCard, LemonTag, Tooltip } from '@posthog/lemon-ui'\n+import { PaginationControl, usePagination } from 'lib/lemon-ui/PaginationControl'\n+import { IconPlusSmall, IconCheckCircle, IconInfo } from '@posthog/icons'\n+import { IconOpenInNew } from 'lib/lemon-ui/icons'\n+import { dataWarehouseSettingsLogic } from './settings/dataWarehouseSettingsLogic'\n+import { dataWarehouseSceneLogic } from './settings/dataWarehouseSceneLogic'\n+import { TZLabel } from 'lib/components/TZLabel'\n+import { IconCancel, IconSync, IconExclamation, IconRadioButtonUnchecked } from 'lib/lemon-ui/icons'\n+import { fetchRecentActivity, fetchTotalRowsProcessed, type UnifiedRecentActivity } from './externalDataSourcesLogic'\n \n-export const scene: SceneExport = {\n- component: DataWarehouseScene,\n+export const scene: SceneExport = { component: DataWarehouseScene }\n+\n+const LIST_SIZE = 5\n+\n+const getSourceType = (sourceType: string, availableSources?: Record<string, any> | null): 'Database' | 'API' => {\n+ const fields = availableSources?.[sourceType]?.fields || []\n+ if (fields.some((f: any) => f.name === 'connection_string' || ['host', 'port', 'database'].includes(f.name))) {\n+ return 'Database'\n+ }\n+ if (fields.some((f: any) => f.type === 'oauth' || ['api_key', 'access_token'].includes(f.name))) {\n+ return 'API'\n+ }\n+ return 'API'\n+}\n+\n+interface DashboardDataSource {\n+ id: string\n+ name: string\n+ type: 'Database' | 'API'\n+ status: string | null\n+ lastSync: string | null\n+ rowCount: number | null\n+ url: string\n }\n \n export function DataWarehouseScene(): JSX.Element {\n const { featureFlags } = useValues(featureFlagLogic)\n+ const { dataWarehouseSources, selfManagedTables } = useValues(dataWarehouseSettingsLogic)\n+ const { materializedViews } = useValues(dataWarehouseSceneLogic)\n+\n+ const [recentActivity, setRecentActivity] = useState<UnifiedRecentActivity[]>([])\n+ const [totalRowsProcessed, setTotalRowsProcessed] = useState<number>(0)\n+\n+ useEffect(() => {\n+ const loadData = async (): Promise<void> => {\n+ const [activities, totalRows] = await Promise.all([\n+ fetchRecentActivity(dataWarehouseSources?.results || [], materializedViews),\n+ fetchTotalRowsProcessed(dataWarehouseSources?.results || [], materializedViews),\n+ ])\n+ setRecentActivity(activities)\n+ setTotalRowsProcessed(totalRows)\n+ }\n+\n+ if ((dataWarehouseSources?.results?.length || 0) > 0 || materializedViews.length > 0) {\n+ loadData()\n+ }\n+ }, [dataWarehouseSources?.results, materializedViews])\n+\n+ const allSources = useMemo(",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2270824949",
"repo_full_name": "PostHog/posthog",
"pr_number": 36474,
"pr_file": "frontend/src/scenes/data-warehouse/DataWarehouseScene.tsx",
"discussion_id": "2270824949",
"commented_code": "@@ -1,43 +1,358 @@\n+import { useMemo, useState, useEffect } from 'react'\n import { SceneExport } from 'scenes/sceneTypes'\n+import { PipelineTab } from '~/types'\n import { FEATURE_FLAGS } from 'lib/constants'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n import { useValues } from 'kea'\n import { NotFound } from 'lib/components/NotFound'\n+import { urls } from 'scenes/urls'\n+import { LemonButton, LemonCard, LemonTag, Tooltip } from '@posthog/lemon-ui'\n+import { PaginationControl, usePagination } from 'lib/lemon-ui/PaginationControl'\n+import { IconPlusSmall, IconCheckCircle, IconInfo } from '@posthog/icons'\n+import { IconOpenInNew } from 'lib/lemon-ui/icons'\n+import { dataWarehouseSettingsLogic } from './settings/dataWarehouseSettingsLogic'\n+import { dataWarehouseSceneLogic } from './settings/dataWarehouseSceneLogic'\n+import { TZLabel } from 'lib/components/TZLabel'\n+import { IconCancel, IconSync, IconExclamation, IconRadioButtonUnchecked } from 'lib/lemon-ui/icons'\n+import { fetchRecentActivity, fetchTotalRowsProcessed, type UnifiedRecentActivity } from './externalDataSourcesLogic'\n \n-export const scene: SceneExport = {\n- component: DataWarehouseScene,\n+export const scene: SceneExport = { component: DataWarehouseScene }\n+\n+const LIST_SIZE = 5\n+\n+const getSourceType = (sourceType: string, availableSources?: Record<string, any> | null): 'Database' | 'API' => {\n+ const fields = availableSources?.[sourceType]?.fields || []\n+ if (fields.some((f: any) => f.name === 'connection_string' || ['host', 'port', 'database'].includes(f.name))) {\n+ return 'Database'\n+ }\n+ if (fields.some((f: any) => f.type === 'oauth' || ['api_key', 'access_token'].includes(f.name))) {\n+ return 'API'\n+ }\n+ return 'API'\n+}\n+\n+interface DashboardDataSource {\n+ id: string\n+ name: string\n+ type: 'Database' | 'API'\n+ status: string | null\n+ lastSync: string | null\n+ rowCount: number | null\n+ url: string\n }\n \n export function DataWarehouseScene(): JSX.Element {\n const { featureFlags } = useValues(featureFlagLogic)\n+ const { dataWarehouseSources, selfManagedTables } = useValues(dataWarehouseSettingsLogic)\n+ const { materializedViews } = useValues(dataWarehouseSceneLogic)\n+\n+ const [recentActivity, setRecentActivity] = useState<UnifiedRecentActivity[]>([])\n+ const [totalRowsProcessed, setTotalRowsProcessed] = useState<number>(0)\n+\n+ useEffect(() => {\n+ const loadData = async (): Promise<void> => {\n+ const [activities, totalRows] = await Promise.all([\n+ fetchRecentActivity(dataWarehouseSources?.results || [], materializedViews),\n+ fetchTotalRowsProcessed(dataWarehouseSources?.results || [], materializedViews),\n+ ])\n+ setRecentActivity(activities)\n+ setTotalRowsProcessed(totalRows)\n+ }\n+\n+ if ((dataWarehouseSources?.results?.length || 0) > 0 || materializedViews.length > 0) {\n+ loadData()\n+ }\n+ }, [dataWarehouseSources?.results, materializedViews])\n+\n+ const allSources = useMemo(",
"comment_created_at": "2025-08-12T18:48:36+00:00",
"comment_author": "EDsCODE",
"comment_body": "All of this logic should be contained within kea too. The rule of thumb is state logic should pretty much not exist in the components themselves besides pulling in actions and values. ",
"pr_file_module": null
},
{
"comment_id": "2271667725",
"repo_full_name": "PostHog/posthog",
"pr_number": 36474,
"pr_file": "frontend/src/scenes/data-warehouse/DataWarehouseScene.tsx",
"discussion_id": "2270824949",
"commented_code": "@@ -1,43 +1,358 @@\n+import { useMemo, useState, useEffect } from 'react'\n import { SceneExport } from 'scenes/sceneTypes'\n+import { PipelineTab } from '~/types'\n import { FEATURE_FLAGS } from 'lib/constants'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n import { useValues } from 'kea'\n import { NotFound } from 'lib/components/NotFound'\n+import { urls } from 'scenes/urls'\n+import { LemonButton, LemonCard, LemonTag, Tooltip } from '@posthog/lemon-ui'\n+import { PaginationControl, usePagination } from 'lib/lemon-ui/PaginationControl'\n+import { IconPlusSmall, IconCheckCircle, IconInfo } from '@posthog/icons'\n+import { IconOpenInNew } from 'lib/lemon-ui/icons'\n+import { dataWarehouseSettingsLogic } from './settings/dataWarehouseSettingsLogic'\n+import { dataWarehouseSceneLogic } from './settings/dataWarehouseSceneLogic'\n+import { TZLabel } from 'lib/components/TZLabel'\n+import { IconCancel, IconSync, IconExclamation, IconRadioButtonUnchecked } from 'lib/lemon-ui/icons'\n+import { fetchRecentActivity, fetchTotalRowsProcessed, type UnifiedRecentActivity } from './externalDataSourcesLogic'\n \n-export const scene: SceneExport = {\n- component: DataWarehouseScene,\n+export const scene: SceneExport = { component: DataWarehouseScene }\n+\n+const LIST_SIZE = 5\n+\n+const getSourceType = (sourceType: string, availableSources?: Record<string, any> | null): 'Database' | 'API' => {\n+ const fields = availableSources?.[sourceType]?.fields || []\n+ if (fields.some((f: any) => f.name === 'connection_string' || ['host', 'port', 'database'].includes(f.name))) {\n+ return 'Database'\n+ }\n+ if (fields.some((f: any) => f.type === 'oauth' || ['api_key', 'access_token'].includes(f.name))) {\n+ return 'API'\n+ }\n+ return 'API'\n+}\n+\n+interface DashboardDataSource {\n+ id: string\n+ name: string\n+ type: 'Database' | 'API'\n+ status: string | null\n+ lastSync: string | null\n+ rowCount: number | null\n+ url: string\n }\n \n export function DataWarehouseScene(): JSX.Element {\n const { featureFlags } = useValues(featureFlagLogic)\n+ const { dataWarehouseSources, selfManagedTables } = useValues(dataWarehouseSettingsLogic)\n+ const { materializedViews } = useValues(dataWarehouseSceneLogic)\n+\n+ const [recentActivity, setRecentActivity] = useState<UnifiedRecentActivity[]>([])\n+ const [totalRowsProcessed, setTotalRowsProcessed] = useState<number>(0)\n+\n+ useEffect(() => {\n+ const loadData = async (): Promise<void> => {\n+ const [activities, totalRows] = await Promise.all([\n+ fetchRecentActivity(dataWarehouseSources?.results || [], materializedViews),\n+ fetchTotalRowsProcessed(dataWarehouseSources?.results || [], materializedViews),\n+ ])\n+ setRecentActivity(activities)\n+ setTotalRowsProcessed(totalRows)\n+ }\n+\n+ if ((dataWarehouseSources?.results?.length || 0) > 0 || materializedViews.length > 0) {\n+ loadData()\n+ }\n+ }, [dataWarehouseSources?.results, materializedViews])\n+\n+ const allSources = useMemo(",
"comment_created_at": "2025-08-13T00:03:29+00:00",
"comment_author": "naumaanh",
"comment_body": "got it, i'll logically adjust this to go into kea, thanks for the rule of thumb i've made sure to note it down \ud83d\udc4d ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,49 @@
---
title: Keep state in Kea
description: React components should focus on presentation and user interaction, not
state management logic. All state logic should be contained within Kea stores, with
components only pulling in actions and values. This separation improves testability,
reusability, and maintainability.
repository: PostHog/posthog
label: React
language: TSX
comments_count: 2
repository_stars: 28460
---
React components should focus on presentation and user interaction, not state management logic. All state logic should be contained within Kea stores, with components only pulling in actions and values. This separation improves testability, reusability, and maintainability.
Components should avoid:
- useState and useEffect for business logic
- Complex state calculations
- Direct API calls or data fetching
Instead, delegate these responsibilities to Kea logic files and consume the results:
```tsx
// ❌ Avoid: State logic in component
export function DataWarehouseScene(): JSX.Element {
const [recentActivity, setRecentActivity] = useState<UnifiedRecentActivity[]>([])
const [totalRowsProcessed, setTotalRowsProcessed] = useState<number>(0)
useEffect(() => {
const loadData = async (): Promise<void> => {
const [activities, totalRows] = await Promise.all([
fetchRecentActivity(dataWarehouseSources?.results || [], materializedViews),
fetchTotalRowsProcessed(dataWarehouseSources?.results || [], materializedViews),
])
setRecentActivity(activities)
setTotalRowsProcessed(totalRows)
}
loadData()
}, [dataWarehouseSources?.results, materializedViews])
}
// ✅ Preferred: State logic in Kea, component consumes values
export function DataWarehouseScene(): JSX.Element {
const { recentActivity, totalRowsProcessed } = useValues(dataWarehouseLogic)
const { loadDashboardData } = useActions(dataWarehouseLogic)
}
```
This pattern ensures components remain focused on rendering and user interactions while keeping business logic centralized and testable.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,36 @@
---
title: leverage framework capabilities
description: Structure workflows and activities to take full advantage of the orchestration
framework's built-in capabilities for fault tolerance, retries, and durable execution
rather than implementing custom solutions.
repository: PostHog/posthog
label: Temporal
language: Python
comments_count: 2
repository_stars: 28460
---
Structure workflows and activities to take full advantage of the orchestration framework's built-in capabilities for fault tolerance, retries, and durable execution rather than implementing custom solutions.
Break down complex operations into separate, resumable steps that can leverage the framework's retry and restart mechanisms. Avoid duplicating functionality that the framework already provides automatically.
For example, instead of implementing a single monolithic operation that handles multiple steps internally:
```python
# Avoid: Single operation that can't resume individual steps
@dagster.op
def snapshot_all_project_data(context, project_id: int, s3: S3Resource):
snapshot_postgres_data(...) # If this succeeds but next fails,
snapshot_clickhouse_data(...) # we can't resume from here
# Prefer: Separate operations that can be retried independently
@dagster.op
def snapshot_postgres_project_data(context, project_id: int, s3: S3Resource):
# This can be retried/resumed independently
@dagster.op
def snapshot_clickhouse_project_data(context, project_id: int, s3: S3Resource):
# This can be retried/resumed independently
```
Similarly, remove redundant code when the framework already provides the needed functionality automatically, such as context attributes for metrics or built-in retry policies.

View File

@@ -0,0 +1,94 @@
[
{
"discussion_id": "2273119714",
"pr_number": 36561,
"pr_file": "bin/start",
"created_at": "2025-08-13T11:41:43+00:00",
"commented_code": "fi\n fi\n \n-# Use minimal config if --minimal flag is passed\n-if [[ \"$*\" == *\"--minimal\"* ]]; then\n+# Check for conflicting flags\n+if [[ \"$*\" == *\"--custom\"* ]] && ([[ \"$*\" == *\"--minimal\"* ]] || [[ \"$*\" == *\"--vite\"* ]]); then\n+ echo \"Error: Cannot use --custom with --minimal or --vite\"\n+ exit 1\n+fi\n+\n+# Use custom config, if provided (e.g. bin/start --custom bin/mprocs-custom.yaml)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2273119714",
"repo_full_name": "PostHog/posthog",
"pr_number": 36561,
"pr_file": "bin/start",
"discussion_id": "2273119714",
"commented_code": "@@ -57,9 +57,36 @@ if ! command -v mprocs &>/dev/null; then\n fi\n fi\n \n-# Use minimal config if --minimal flag is passed\n-if [[ \"$*\" == *\"--minimal\"* ]]; then\n+# Check for conflicting flags\n+if [[ \"$*\" == *\"--custom\"* ]] && ([[ \"$*\" == *\"--minimal\"* ]] || [[ \"$*\" == *\"--vite\"* ]]); then\n+ echo \"Error: Cannot use --custom with --minimal or --vite\"\n+ exit 1\n+fi\n+\n+# Use custom config, if provided (e.g. bin/start --custom bin/mprocs-custom.yaml)",
"comment_created_at": "2025-08-13T11:41:43+00:00",
"comment_author": "skoob13",
"comment_body": "Should we exclude in git custom configs? Like `bin/mprocs*.local.yaml` in gitignore.",
"pr_file_module": null
},
{
"comment_id": "2273133457",
"repo_full_name": "PostHog/posthog",
"pr_number": 36561,
"pr_file": "bin/start",
"discussion_id": "2273119714",
"commented_code": "@@ -57,9 +57,36 @@ if ! command -v mprocs &>/dev/null; then\n fi\n fi\n \n-# Use minimal config if --minimal flag is passed\n-if [[ \"$*\" == *\"--minimal\"* ]]; then\n+# Check for conflicting flags\n+if [[ \"$*\" == *\"--custom\"* ]] && ([[ \"$*\" == *\"--minimal\"* ]] || [[ \"$*\" == *\"--vite\"* ]]); then\n+ echo \"Error: Cannot use --custom with --minimal or --vite\"\n+ exit 1\n+fi\n+\n+# Use custom config, if provided (e.g. bin/start --custom bin/mprocs-custom.yaml)",
"comment_created_at": "2025-08-13T11:46:15+00:00",
"comment_author": "rossgray",
"comment_body": "this is a good idea",
"pr_file_module": null
},
{
"comment_id": "2273137531",
"repo_full_name": "PostHog/posthog",
"pr_number": 36561,
"pr_file": "bin/start",
"discussion_id": "2273119714",
"commented_code": "@@ -57,9 +57,36 @@ if ! command -v mprocs &>/dev/null; then\n fi\n fi\n \n-# Use minimal config if --minimal flag is passed\n-if [[ \"$*\" == *\"--minimal\"* ]]; then\n+# Check for conflicting flags\n+if [[ \"$*\" == *\"--custom\"* ]] && ([[ \"$*\" == *\"--minimal\"* ]] || [[ \"$*\" == *\"--vite\"* ]]); then\n+ echo \"Error: Cannot use --custom with --minimal or --vite\"\n+ exit 1\n+fi\n+\n+# Use custom config, if provided (e.g. bin/start --custom bin/mprocs-custom.yaml)",
"comment_created_at": "2025-08-13T11:47:37+00:00",
"comment_author": "sortafreel",
"comment_body": "I assume, it depends on where you store your configs. I have a `playground` directory (inside each project) added to the global `gitignore` where I put stuff like this. \r\n\r\nAdding it to `.gitignore` in the suggested way would force a location/naming preference, as I see it. But it won't hurt, also. Happy to add.",
"pr_file_module": null
},
{
"comment_id": "2273144126",
"repo_full_name": "PostHog/posthog",
"pr_number": 36561,
"pr_file": "bin/start",
"discussion_id": "2273119714",
"commented_code": "@@ -57,9 +57,36 @@ if ! command -v mprocs &>/dev/null; then\n fi\n fi\n \n-# Use minimal config if --minimal flag is passed\n-if [[ \"$*\" == *\"--minimal\"* ]]; then\n+# Check for conflicting flags\n+if [[ \"$*\" == *\"--custom\"* ]] && ([[ \"$*\" == *\"--minimal\"* ]] || [[ \"$*\" == *\"--vite\"* ]]); then\n+ echo \"Error: Cannot use --custom with --minimal or --vite\"\n+ exit 1\n+fi\n+\n+# Use custom config, if provided (e.g. bin/start --custom bin/mprocs-custom.yaml)",
"comment_created_at": "2025-08-13T11:49:41+00:00",
"comment_author": "sortafreel",
"comment_body": "https://github.com/PostHog/posthog/pull/36561/commits/592983115c37ca019a4eccd17ff8fb6b7de7d396",
"pr_file_module": null
}
]
},
{
"discussion_id": "2257407472",
"pr_number": 36254,
"pr_file": "bin/posthog-worktree",
"created_at": "2025-08-06T14:35:16+00:00",
"commented_code": "cp \"${main_repo}/.flox/env/manifest.toml\" \"${worktree_path}/.flox/env/\"\n cp \"${main_repo}/.envrc\" \"${worktree_path}/\"\n \n+ # Copy .vscode/settings.json if it exists\n+ if [[ -f \"${main_repo}/.vscode/settings.json\" ]]; then\n+ print_color blue \"Copying .vscode/settings.json\u2026\"\n+ mkdir -p \"${worktree_path}/.vscode\"\n+ cp \"${main_repo}/.vscode/settings.json\" \"${worktree_path}/.vscode/\"\n+ fi\n+ \n+ # Copy any .env files if they exist\n+ if find \"${main_repo}\" -maxdepth 1 -name \".env*\" -type f | grep -q .; then",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2257407472",
"repo_full_name": "PostHog/posthog",
"pr_number": 36254,
"pr_file": "bin/posthog-worktree",
"discussion_id": "2257407472",
"commented_code": "@@ -183,6 +183,19 @@ setup_worktree_environment() {\n cp \"${main_repo}/.flox/env/manifest.toml\" \"${worktree_path}/.flox/env/\"\n cp \"${main_repo}/.envrc\" \"${worktree_path}/\"\n \n+ # Copy .vscode/settings.json if it exists\n+ if [[ -f \"${main_repo}/.vscode/settings.json\" ]]; then\n+ print_color blue \"Copying .vscode/settings.json\u2026\"\n+ mkdir -p \"${worktree_path}/.vscode\"\n+ cp \"${main_repo}/.vscode/settings.json\" \"${worktree_path}/.vscode/\"\n+ fi\n+ \n+ # Copy any .env files if they exist\n+ if find \"${main_repo}\" -maxdepth 1 -name \".env*\" -type f | grep -q .; then",
"comment_created_at": "2025-08-06T14:35:16+00:00",
"comment_author": "haacked",
"comment_body": "Should this be more strict and copy `.env` and `.env.*`?",
"pr_file_module": null
},
{
"comment_id": "2257417700",
"repo_full_name": "PostHog/posthog",
"pr_number": 36254,
"pr_file": "bin/posthog-worktree",
"discussion_id": "2257407472",
"commented_code": "@@ -183,6 +183,19 @@ setup_worktree_environment() {\n cp \"${main_repo}/.flox/env/manifest.toml\" \"${worktree_path}/.flox/env/\"\n cp \"${main_repo}/.envrc\" \"${worktree_path}/\"\n \n+ # Copy .vscode/settings.json if it exists\n+ if [[ -f \"${main_repo}/.vscode/settings.json\" ]]; then\n+ print_color blue \"Copying .vscode/settings.json\u2026\"\n+ mkdir -p \"${worktree_path}/.vscode\"\n+ cp \"${main_repo}/.vscode/settings.json\" \"${worktree_path}/.vscode/\"\n+ fi\n+ \n+ # Copy any .env files if they exist\n+ if find \"${main_repo}\" -maxdepth 1 -name \".env*\" -type f | grep -q .; then",
"comment_created_at": "2025-08-06T14:38:37+00:00",
"comment_author": "kappa90",
"comment_body": "Updated :)",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,32 @@
---
title: Local configuration exclusion
description: Exclude personal and local configuration files from version control while
ensuring they are properly handled during environment setup. Personal configurations
should use patterns like `*.local.*` or be placed in designated directories that
are gitignored to prevent accidental commits of developer-specific settings.
repository: PostHog/posthog
label: Configurations
language: Other
comments_count: 2
repository_stars: 28460
---
Exclude personal and local configuration files from version control while ensuring they are properly handled during environment setup. Personal configurations should use patterns like `*.local.*` or be placed in designated directories that are gitignored to prevent accidental commits of developer-specific settings.
When setting up new environments or worktrees, be explicit about which configuration files to copy. Use specific patterns rather than wildcards to avoid copying unintended files.
Example:
```bash
# In .gitignore
bin/mprocs*.local.yaml
.env.local
playground/
# In setup scripts - be specific about what to copy
if [[ -f "${main_repo}/.env" ]]; then
cp "${main_repo}/.env" "${worktree_path}/"
fi
# Rather than copying all .env* files indiscriminately
```
This prevents personal development configurations from polluting the shared codebase while ensuring consistent environment setup across different development contexts.

View File

@@ -0,0 +1,196 @@
[
{
"discussion_id": "2277342773",
"pr_number": 36655,
"pr_file": "frontend/src/scenes/onboarding/sdks/sdksLogic.tsx",
"created_at": "2025-08-14T17:45:25+00:00",
"commented_code": "values: [\n onboardingLogic,\n ['productKey'],\n- liveEventsTableLogic,\n+ liveEventsTableLogic({ tabId: 'sdks' }),",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2277342773",
"repo_full_name": "PostHog/posthog",
"pr_number": 36655,
"pr_file": "frontend/src/scenes/onboarding/sdks/sdksLogic.tsx",
"discussion_id": "2277342773",
"commented_code": "@@ -47,7 +47,7 @@ export const sdksLogic = kea<sdksLogicType>([\n values: [\n onboardingLogic,\n ['productKey'],\n- liveEventsTableLogic,\n+ liveEventsTableLogic({ tabId: 'sdks' }),",
"comment_created_at": "2025-08-14T17:45:25+00:00",
"comment_author": "rafaeelaudibert",
"comment_body": "Same as above, what happens if we have several tabs open?",
"pr_file_module": null
},
{
"comment_id": "2277373764",
"repo_full_name": "PostHog/posthog",
"pr_number": 36655,
"pr_file": "frontend/src/scenes/onboarding/sdks/sdksLogic.tsx",
"discussion_id": "2277342773",
"commented_code": "@@ -47,7 +47,7 @@ export const sdksLogic = kea<sdksLogicType>([\n values: [\n onboardingLogic,\n ['productKey'],\n- liveEventsTableLogic,\n+ liveEventsTableLogic({ tabId: 'sdks' }),",
"comment_created_at": "2025-08-14T18:00:05+00:00",
"comment_author": "mariusandra",
"comment_body": "Since both of these are for the onboarding scene, I think it makes sense to use the same logic, so renamed `sdks` -> `onboarding`. Now all onboarding pages will just talk to the same live events logic.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2269463763",
"pr_number": 36421,
"pr_file": "products/actions/frontend/components/NewActionButton.tsx",
"created_at": "2025-08-12T10:51:40+00:00",
"commented_code": "icon={<IconPencil />}\n onClick={() => {\n onSelectOption?.()\n- router.actions.push(urls.createAction())\n+ router.actions.push(productUrls.createAction())",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2269463763",
"repo_full_name": "PostHog/posthog",
"pr_number": 36421,
"pr_file": "products/actions/frontend/components/NewActionButton.tsx",
"discussion_id": "2269463763",
"commented_code": "@@ -61,7 +61,7 @@ export function NewActionButton({ onSelectOption }: { onSelectOption?: () => voi\n icon={<IconPencil />}\n onClick={() => {\n onSelectOption?.()\n- router.actions.push(urls.createAction())\n+ router.actions.push(productUrls.createAction())",
"comment_created_at": "2025-08-12T10:51:40+00:00",
"comment_author": "mariusandra",
"comment_body": "Same comment for urls, here and elsewhere",
"pr_file_module": null
},
{
"comment_id": "2270921589",
"repo_full_name": "PostHog/posthog",
"pr_number": 36421,
"pr_file": "products/actions/frontend/components/NewActionButton.tsx",
"discussion_id": "2269463763",
"commented_code": "@@ -61,7 +61,7 @@ export function NewActionButton({ onSelectOption }: { onSelectOption?: () => voi\n icon={<IconPencil />}\n onClick={() => {\n onSelectOption?.()\n- router.actions.push(urls.createAction())\n+ router.actions.push(productUrls.createAction())",
"comment_created_at": "2025-08-12T19:15:22+00:00",
"comment_author": "mariusandra",
"comment_body": "should be `urls`",
"pr_file_module": null
}
]
},
{
"discussion_id": "2269183800",
"pr_number": 36351,
"pr_file": "frontend/src/scenes/max/maxGlobalLogic.tsx",
"created_at": "2025-08-12T08:59:18+00:00",
"commented_code": "import { sceneLogic } from 'scenes/sceneLogic'\n import { urls } from 'scenes/urls'\n import { router } from 'kea-router'\n-import { AssistantContextualTool, AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n+import { AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n import { routes } from 'scenes/scenes'\n-import { IconCompass } from '@posthog/icons'\n+import { IconBook, IconCompass, IconEye } from '@posthog/icons'\n import { Scene } from 'scenes/sceneTypes'\n import { SidePanelTab } from '~/types'\n import { sidePanelLogic } from '~/layout/navigation-3000/sidepanel/sidePanelLogic'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n+import { TOOL_DEFINITIONS, ToolRegistration } from './max-constants'\n \n-export interface ToolDefinition {\n- /** A unique identifier for the tool */\n- name: AssistantContextualTool\n- /** A user-friendly display name for the tool. This must be a verb phrase, like \"Create smth\" or \"Search smth\" */\n- displayName: string\n- /** A user-friendly description for the tool */\n- description: `Max can ${string}`\n- /**\n- * Optional specific @posthog/icons icon\n- * @default <IconWrench />\n- */\n- icon?: React.ReactNode\n- /** Contextual data to be included for use by the LLM */\n- context: Record<string, any>\n- /**\n- * Optional: If this tool is the main one of the page, you can override Max's default intro headline and description when it's mounted.\n- *\n- * Note that if more than one mounted tool has an intro override, only one will take effect.\n- */\n- introOverride?: {\n- /** The default is something like \"How can I help you build?\" - stick true to this question form. */\n- headline: string\n- /** The default is \"Ask me about your product and your users.\" */\n- description: string\n- }\n- /** Optional: When in context, the tool can add items to the pool of Max's suggested questions */\n- suggestions?: string[] // TODO: Suggestions aren't used yet, pending a refactor of maxLogic's allSuggestions\n- /** The callback function that will be executed with the LLM's tool call output */\n- callback: (toolOutput: any) => void | Promise<void>\n-}\n+/** Tools available everywhere. These CAN be shadowed by contextual tools for scene-specific handling (e.g. to intercept insight creation). */\n+export const STATIC_TOOLS: ToolRegistration[] = [\n+ {\n+ identifier: 'navigate' as const,\n+ name: TOOL_DEFINITIONS['navigate'].name,\n+ description: TOOL_DEFINITIONS['navigate'].description,\n+ icon: <IconCompass />,\n+ context: { current_page: location.pathname },\n+ callback: async (toolOutput) => {\n+ const { page_key: pageKey } = toolOutput\n+ if (!(pageKey in urls)) {\n+ throw new Error(`${pageKey} not in urls`)\n+ }\n+ const url = urls[pageKey as AssistantNavigateUrls]()\n+ router.actions.push(url)\n+ // First wait for navigation to complete\n+ await new Promise<void>((resolve, reject) => {\n+ const NAVIGATION_TIMEOUT = 1000 // 1 second timeout\n+ const startTime = performance.now()\n+ const checkPathname = (): void => {\n+ if (sceneLogic.values.activeScene === routes[url]?.[0]) {\n+ resolve()\n+ } else if (performance.now() - startTime > NAVIGATION_TIMEOUT) {\n+ reject(new Error('Navigation timeout'))\n+ } else {\n+ setTimeout(checkPathname, 50)\n+ }\n+ }\n+ checkPathname()\n+ })\n+ },\n+ },\n+ {\n+ identifier: 'search_docs' as const,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2269183800",
"repo_full_name": "PostHog/posthog",
"pr_number": 36351,
"pr_file": "frontend/src/scenes/max/maxGlobalLogic.tsx",
"discussion_id": "2269183800",
"commented_code": "@@ -7,44 +7,60 @@ import type { maxGlobalLogicType } from './maxGlobalLogicType'\n import { sceneLogic } from 'scenes/sceneLogic'\n import { urls } from 'scenes/urls'\n import { router } from 'kea-router'\n-import { AssistantContextualTool, AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n+import { AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n import { routes } from 'scenes/scenes'\n-import { IconCompass } from '@posthog/icons'\n+import { IconBook, IconCompass, IconEye } from '@posthog/icons'\n import { Scene } from 'scenes/sceneTypes'\n import { SidePanelTab } from '~/types'\n import { sidePanelLogic } from '~/layout/navigation-3000/sidepanel/sidePanelLogic'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n+import { TOOL_DEFINITIONS, ToolRegistration } from './max-constants'\n \n-export interface ToolDefinition {\n- /** A unique identifier for the tool */\n- name: AssistantContextualTool\n- /** A user-friendly display name for the tool. This must be a verb phrase, like \"Create smth\" or \"Search smth\" */\n- displayName: string\n- /** A user-friendly description for the tool */\n- description: `Max can ${string}`\n- /**\n- * Optional specific @posthog/icons icon\n- * @default <IconWrench />\n- */\n- icon?: React.ReactNode\n- /** Contextual data to be included for use by the LLM */\n- context: Record<string, any>\n- /**\n- * Optional: If this tool is the main one of the page, you can override Max's default intro headline and description when it's mounted.\n- *\n- * Note that if more than one mounted tool has an intro override, only one will take effect.\n- */\n- introOverride?: {\n- /** The default is something like \"How can I help you build?\" - stick true to this question form. */\n- headline: string\n- /** The default is \"Ask me about your product and your users.\" */\n- description: string\n- }\n- /** Optional: When in context, the tool can add items to the pool of Max's suggested questions */\n- suggestions?: string[] // TODO: Suggestions aren't used yet, pending a refactor of maxLogic's allSuggestions\n- /** The callback function that will be executed with the LLM's tool call output */\n- callback: (toolOutput: any) => void | Promise<void>\n-}\n+/** Tools available everywhere. These CAN be shadowed by contextual tools for scene-specific handling (e.g. to intercept insight creation). */\n+export const STATIC_TOOLS: ToolRegistration[] = [\n+ {\n+ identifier: 'navigate' as const,\n+ name: TOOL_DEFINITIONS['navigate'].name,\n+ description: TOOL_DEFINITIONS['navigate'].description,\n+ icon: <IconCompass />,\n+ context: { current_page: location.pathname },\n+ callback: async (toolOutput) => {\n+ const { page_key: pageKey } = toolOutput\n+ if (!(pageKey in urls)) {\n+ throw new Error(`${pageKey} not in urls`)\n+ }\n+ const url = urls[pageKey as AssistantNavigateUrls]()\n+ router.actions.push(url)\n+ // First wait for navigation to complete\n+ await new Promise<void>((resolve, reject) => {\n+ const NAVIGATION_TIMEOUT = 1000 // 1 second timeout\n+ const startTime = performance.now()\n+ const checkPathname = (): void => {\n+ if (sceneLogic.values.activeScene === routes[url]?.[0]) {\n+ resolve()\n+ } else if (performance.now() - startTime > NAVIGATION_TIMEOUT) {\n+ reject(new Error('Navigation timeout'))\n+ } else {\n+ setTimeout(checkPathname, 50)\n+ }\n+ }\n+ checkPathname()\n+ })\n+ },\n+ },\n+ {\n+ identifier: 'search_docs' as const,",
"comment_created_at": "2025-08-12T08:59:18+00:00",
"comment_author": "sortafreel",
"comment_body": "Nitpick: I assume `search_docs` equals to `search_documentation` on BE? A bit confused with naming.",
"pr_file_module": null
},
{
"comment_id": "2269653011",
"repo_full_name": "PostHog/posthog",
"pr_number": 36351,
"pr_file": "frontend/src/scenes/max/maxGlobalLogic.tsx",
"discussion_id": "2269183800",
"commented_code": "@@ -7,44 +7,60 @@ import type { maxGlobalLogicType } from './maxGlobalLogicType'\n import { sceneLogic } from 'scenes/sceneLogic'\n import { urls } from 'scenes/urls'\n import { router } from 'kea-router'\n-import { AssistantContextualTool, AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n+import { AssistantNavigateUrls } from '~/queries/schema/schema-assistant-messages'\n import { routes } from 'scenes/scenes'\n-import { IconCompass } from '@posthog/icons'\n+import { IconBook, IconCompass, IconEye } from '@posthog/icons'\n import { Scene } from 'scenes/sceneTypes'\n import { SidePanelTab } from '~/types'\n import { sidePanelLogic } from '~/layout/navigation-3000/sidepanel/sidePanelLogic'\n import { featureFlagLogic } from 'lib/logic/featureFlagLogic'\n+import { TOOL_DEFINITIONS, ToolRegistration } from './max-constants'\n \n-export interface ToolDefinition {\n- /** A unique identifier for the tool */\n- name: AssistantContextualTool\n- /** A user-friendly display name for the tool. This must be a verb phrase, like \"Create smth\" or \"Search smth\" */\n- displayName: string\n- /** A user-friendly description for the tool */\n- description: `Max can ${string}`\n- /**\n- * Optional specific @posthog/icons icon\n- * @default <IconWrench />\n- */\n- icon?: React.ReactNode\n- /** Contextual data to be included for use by the LLM */\n- context: Record<string, any>\n- /**\n- * Optional: If this tool is the main one of the page, you can override Max's default intro headline and description when it's mounted.\n- *\n- * Note that if more than one mounted tool has an intro override, only one will take effect.\n- */\n- introOverride?: {\n- /** The default is something like \"How can I help you build?\" - stick true to this question form. */\n- headline: string\n- /** The default is \"Ask me about your product and your users.\" */\n- description: string\n- }\n- /** Optional: When in context, the tool can add items to the pool of Max's suggested questions */\n- suggestions?: string[] // TODO: Suggestions aren't used yet, pending a refactor of maxLogic's allSuggestions\n- /** The callback function that will be executed with the LLM's tool call output */\n- callback: (toolOutput: any) => void | Promise<void>\n-}\n+/** Tools available everywhere. These CAN be shadowed by contextual tools for scene-specific handling (e.g. to intercept insight creation). */\n+export const STATIC_TOOLS: ToolRegistration[] = [\n+ {\n+ identifier: 'navigate' as const,\n+ name: TOOL_DEFINITIONS['navigate'].name,\n+ description: TOOL_DEFINITIONS['navigate'].description,\n+ icon: <IconCompass />,\n+ context: { current_page: location.pathname },\n+ callback: async (toolOutput) => {\n+ const { page_key: pageKey } = toolOutput\n+ if (!(pageKey in urls)) {\n+ throw new Error(`${pageKey} not in urls`)\n+ }\n+ const url = urls[pageKey as AssistantNavigateUrls]()\n+ router.actions.push(url)\n+ // First wait for navigation to complete\n+ await new Promise<void>((resolve, reject) => {\n+ const NAVIGATION_TIMEOUT = 1000 // 1 second timeout\n+ const startTime = performance.now()\n+ const checkPathname = (): void => {\n+ if (sceneLogic.values.activeScene === routes[url]?.[0]) {\n+ resolve()\n+ } else if (performance.now() - startTime > NAVIGATION_TIMEOUT) {\n+ reject(new Error('Navigation timeout'))\n+ } else {\n+ setTimeout(checkPathname, 50)\n+ }\n+ }\n+ checkPathname()\n+ })\n+ },\n+ },\n+ {\n+ identifier: 'search_docs' as const,",
"comment_created_at": "2025-08-12T12:16:58+00:00",
"comment_author": "Twixes",
"comment_body": "Yes, `search_documentation` is the graph path, `search_docs` is the tool's identifier. Slight inconsistency",
"pr_file_module": null
}
]
},
{
"discussion_id": "2250711770",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"created_at": "2025-08-04T07:57:16+00:00",
"commented_code": "export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2250711770",
"repo_full_name": "PostHog/posthog",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"discussion_id": "2250711770",
"commented_code": "@@ -54,13 +52,28 @@ const QuestionIconMap = {\n export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}",
"comment_created_at": "2025-08-04T07:57:16+00:00",
"comment_author": "marandaneto",
"comment_body": "same as https://github.com/PostHog/posthog/pull/36080/files#r2250709670\r\nwe are using multiple wording for track/collect/identify response/identify responses to users, etc, lets agree on something and use it everywhere",
"pr_file_module": null
},
{
"comment_id": "2251680312",
"repo_full_name": "PostHog/posthog",
"pr_number": 36080,
"pr_file": "frontend/src/scenes/surveys/SurveyOverview.tsx",
"discussion_id": "2250711770",
"commented_code": "@@ -54,13 +52,28 @@ const QuestionIconMap = {\n export function SurveyOverview(): JSX.Element {\n const { survey, selectedPageIndex, targetingFlagFilters } = useValues(surveyLogic)\n const { setSelectedPageIndex } = useActions(surveyLogic)\n- const { featureFlags } = useValues(featureFlagLogic)\n+\n+ const isExternalSurvey = survey.type === SurveyType.ExternalSurvey\n \n const { surveyUsesLimit, surveyUsesAdaptiveLimit } = useValues(surveyLogic)\n return (\n <div className=\"flex gap-4\">\n <dl className=\"flex flex-col gap-4 flex-1 overflow-hidden\">\n- <SurveyOption label=\"Display mode\">{SURVEY_TYPE_LABEL_MAP[survey.type]}</SurveyOption>\n+ <SurveyOption label=\"Display mode\">\n+ <div className=\"flex flex-col\">\n+ <div className=\"flex flex-row items-center gap-2\">\n+ {SURVEY_TYPE_LABEL_MAP[survey.type]}\n+ {isExternalSurvey && <CopySurveyLink surveyId={survey.id} className=\"w-fit\" />}\n+ </div>\n+ {isExternalSurvey && (\n+ <span>\n+ Track responses to users by adding{' '}",
"comment_created_at": "2025-08-04T14:30:08+00:00",
"comment_author": "lucasheriques",
"comment_body": "changed language to \"identify respondents\" on all places here (will do the same on docs)",
"pr_file_module": null
}
]
},
{
"discussion_id": "2237904401",
"pr_number": 35754,
"pr_file": "frontend/src/layout/FeaturePreviews/featurePreviewsLogic.tsx",
"created_at": "2025-07-28T21:33:28+00:00",
"commented_code": "actions: [supportLogic, ['submitZendeskTicket']],\n })),\n actions({\n- updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean) => ({ flagKey, enabled }),\n+ updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean, stage?: string) => ({\n+ flagKey,\n+ enabled,\n+ stage,\n+ }),",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2237904401",
"repo_full_name": "PostHog/posthog",
"pr_number": 35754,
"pr_file": "frontend/src/layout/FeaturePreviews/featurePreviewsLogic.tsx",
"discussion_id": "2237904401",
"commented_code": "@@ -26,7 +26,11 @@ export const featurePreviewsLogic = kea<featurePreviewsLogicType>([\n actions: [supportLogic, ['submitZendeskTicket']],\n })),\n actions({\n- updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean) => ({ flagKey, enabled }),\n+ updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean, stage?: string) => ({\n+ flagKey,\n+ enabled,\n+ stage,\n+ }),",
"comment_created_at": "2025-07-28T21:33:28+00:00",
"comment_author": "dmarticus",
"comment_body": "minor, but should `stage` be an enum instead of a string? ",
"pr_file_module": null
},
{
"comment_id": "2237925022",
"repo_full_name": "PostHog/posthog",
"pr_number": 35754,
"pr_file": "frontend/src/layout/FeaturePreviews/featurePreviewsLogic.tsx",
"discussion_id": "2237904401",
"commented_code": "@@ -26,7 +26,11 @@ export const featurePreviewsLogic = kea<featurePreviewsLogicType>([\n actions: [supportLogic, ['submitZendeskTicket']],\n })),\n actions({\n- updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean) => ({ flagKey, enabled }),\n+ updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean, stage?: string) => ({\n+ flagKey,\n+ enabled,\n+ stage,\n+ }),",
"comment_created_at": "2025-07-28T21:49:41+00:00",
"comment_author": "haacked",
"comment_body": "Looks like there's an existing enum:\r\n\r\n```ts\r\nexport enum EarlyAccessFeatureStage {\r\n Draft = 'draft',\r\n Concept = 'concept',\r\n Alpha = 'alpha',\r\n Beta = 'beta',\r\n GeneralAvailability = 'general-availability',\r\n Archived = 'archived',\r\n}\r\n```\r\n\r\nI noticed that in `posthog-js` there's also an enum, but it has less options:\r\n\r\n```ts\r\nexport type EarlyAccessFeatureStage = 'concept' | 'alpha' | 'beta' | 'general-availability'\r\n```\r\n\r\nSeems to me that these two should be the same and we should make use the enum type on both sides. Or should we let the js version continue to accept a string and only make the type change here?",
"pr_file_module": null
},
{
"comment_id": "2237932526",
"repo_full_name": "PostHog/posthog",
"pr_number": 35754,
"pr_file": "frontend/src/layout/FeaturePreviews/featurePreviewsLogic.tsx",
"discussion_id": "2237904401",
"commented_code": "@@ -26,7 +26,11 @@ export const featurePreviewsLogic = kea<featurePreviewsLogicType>([\n actions: [supportLogic, ['submitZendeskTicket']],\n })),\n actions({\n- updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean) => ({ flagKey, enabled }),\n+ updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean, stage?: string) => ({\n+ flagKey,\n+ enabled,\n+ stage,\n+ }),",
"comment_created_at": "2025-07-28T21:55:47+00:00",
"comment_author": "dmarticus",
"comment_body": "ooh yeah they should be the same... annoying that the JS SDK is less-principled.\r\n\r\nIn an ideal world I would unite these enums across product + SDK. In reality... your call.",
"pr_file_module": null
},
{
"comment_id": "2252055074",
"repo_full_name": "PostHog/posthog",
"pr_number": 35754,
"pr_file": "frontend/src/layout/FeaturePreviews/featurePreviewsLogic.tsx",
"discussion_id": "2237904401",
"commented_code": "@@ -26,7 +26,11 @@ export const featurePreviewsLogic = kea<featurePreviewsLogicType>([\n actions: [supportLogic, ['submitZendeskTicket']],\n })),\n actions({\n- updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean) => ({ flagKey, enabled }),\n+ updateEarlyAccessFeatureEnrollment: (flagKey: string, enabled: boolean, stage?: string) => ({\n+ flagKey,\n+ enabled,\n+ stage,\n+ }),",
"comment_created_at": "2025-08-04T16:53:12+00:00",
"comment_author": "haacked",
"comment_body": "I'm going to leave it as a string for now. We can update this later as it sounds like we may be investing more in this area soon.",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,35 @@
---
title: Maintain naming consistency
description: Ensure consistent naming conventions, terminology, and identifiers across
the entire codebase. Names should be uniform between frontend/backend, across different
modules, and within the same domain.
repository: PostHog/posthog
label: Naming Conventions
language: TSX
comments_count: 5
repository_stars: 28460
---
Ensure consistent naming conventions, terminology, and identifiers across the entire codebase. Names should be uniform between frontend/backend, across different modules, and within the same domain.
Key areas to check:
- **Cross-system consistency**: Frontend and backend should use the same identifiers for the same concepts (e.g., `search_docs` vs `search_documentation`)
- **Module consistency**: Related modules should use consistent naming patterns (e.g., `sdks` vs `onboarding` logic should align with their actual scope)
- **Terminology consistency**: Use the same terms throughout the codebase for the same concepts (e.g., standardize on "identify respondents" rather than mixing "track/collect/identify responses")
- **Type consistency**: Use the same enums/types across frontend and backend for shared concepts
- **Import consistency**: Consistently use the same import sources (e.g., always use `urls` rather than mixing `urls` and `productUrls`)
Example of inconsistent naming:
```typescript
// Frontend uses different identifier than backend
identifier: 'search_docs' as const, // Frontend
// vs
search_documentation // Backend API path
// Mixed terminology in UI
"Track responses to users"
"Collect user responses"
"Identify respondents" // Should standardize on one
```
Before merging, verify that new names align with existing patterns and that any changes maintain consistency across all related files and systems.

View File

@@ -0,0 +1,106 @@
[
{
"discussion_id": "2262597134",
"pr_number": 36360,
"pr_file": "frontend/src/scenes/insights/filters/BreakdownFilter/BreakdownTagMenu.tsx",
"created_at": "2025-08-08T10:40:23+00:00",
"commented_code": "onClick={() => {\n setHistogramBinsUsed(false)\n }}\n- active={!histogramBinsUsed}\n+ active={!histogramBinsUsed && !breakdownBins?.length}\n className=\"mt-2\"\n fullWidth\n >\n Do not bin numeric values\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ !breakdownBins?.length ? [{ low: null, high: null }] : []\n+ )\n+ }}\n+ active={!!breakdownBins?.length}\n+ className=\"mt-2\"\n+ fullWidth\n+ >\n+ Custom bins\n+ </LemonButton>\n+ {!!breakdownBins?.length && (\n+ <div className=\"p-2\">\n+ <div className=\"space-y-2\">\n+ {breakdownBins.map((bin: BreakdownBin, index: number) => (\n+ <div key={index} className=\"flex items-center gap-2\">\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.low ?? undefined}\n+ onChange={(lowNum) => {\n+ const low = lowNum !== undefined ? lowNum : null\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ breakdownBins.map((b: BreakdownBin, i: number) =>\n+ i === index ? { ...b, low } : b\n+ )\n+ )\n+ }}\n+ placeholder=\"Min\"\n+ />\n+ <span>-</span>\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.high ?? undefined}\n+ onChange={(highNum) => {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2262597134",
"repo_full_name": "PostHog/posthog",
"pr_number": 36360,
"pr_file": "frontend/src/scenes/insights/filters/BreakdownFilter/BreakdownTagMenu.tsx",
"discussion_id": "2262597134",
"commented_code": "@@ -80,12 +89,93 @@ export const BreakdownTagMenu = (): JSX.Element => {\n onClick={() => {\n setHistogramBinsUsed(false)\n }}\n- active={!histogramBinsUsed}\n+ active={!histogramBinsUsed && !breakdownBins?.length}\n className=\"mt-2\"\n fullWidth\n >\n Do not bin numeric values\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ !breakdownBins?.length ? [{ low: null, high: null }] : []\n+ )\n+ }}\n+ active={!!breakdownBins?.length}\n+ className=\"mt-2\"\n+ fullWidth\n+ >\n+ Custom bins\n+ </LemonButton>\n+ {!!breakdownBins?.length && (\n+ <div className=\"p-2\">\n+ <div className=\"space-y-2\">\n+ {breakdownBins.map((bin: BreakdownBin, index: number) => (\n+ <div key={index} className=\"flex items-center gap-2\">\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.low ?? undefined}\n+ onChange={(lowNum) => {\n+ const low = lowNum !== undefined ? lowNum : null\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ breakdownBins.map((b: BreakdownBin, i: number) =>\n+ i === index ? { ...b, low } : b\n+ )\n+ )\n+ }}\n+ placeholder=\"Min\"\n+ />\n+ <span>-</span>\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.high ?? undefined}\n+ onChange={(highNum) => {",
"comment_created_at": "2025-08-08T10:40:23+00:00",
"comment_author": "zlwaterfield",
"comment_body": "I think we should only requery on blur not change because the UI becomes very jumpy",
"pr_file_module": null
},
{
"comment_id": "2262632310",
"repo_full_name": "PostHog/posthog",
"pr_number": 36360,
"pr_file": "frontend/src/scenes/insights/filters/BreakdownFilter/BreakdownTagMenu.tsx",
"discussion_id": "2262597134",
"commented_code": "@@ -80,12 +89,93 @@ export const BreakdownTagMenu = (): JSX.Element => {\n onClick={() => {\n setHistogramBinsUsed(false)\n }}\n- active={!histogramBinsUsed}\n+ active={!histogramBinsUsed && !breakdownBins?.length}\n className=\"mt-2\"\n fullWidth\n >\n Do not bin numeric values\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ !breakdownBins?.length ? [{ low: null, high: null }] : []\n+ )\n+ }}\n+ active={!!breakdownBins?.length}\n+ className=\"mt-2\"\n+ fullWidth\n+ >\n+ Custom bins\n+ </LemonButton>\n+ {!!breakdownBins?.length && (\n+ <div className=\"p-2\">\n+ <div className=\"space-y-2\">\n+ {breakdownBins.map((bin: BreakdownBin, index: number) => (\n+ <div key={index} className=\"flex items-center gap-2\">\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.low ?? undefined}\n+ onChange={(lowNum) => {\n+ const low = lowNum !== undefined ? lowNum : null\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ breakdownBins.map((b: BreakdownBin, i: number) =>\n+ i === index ? { ...b, low } : b\n+ )\n+ )\n+ }}\n+ placeholder=\"Min\"\n+ />\n+ <span>-</span>\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.high ?? undefined}\n+ onChange={(highNum) => {",
"comment_created_at": "2025-08-08T10:56:11+00:00",
"comment_author": "anirudhpillai",
"comment_body": "I'm iterating on this heavily as we speak, but seems to make sense to have a save button here to prevent all the re query (breakdown queries are expensive as is)\r\n\r\n<img width=\"323\" height=\"413\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c2ef22f6-eba0-4de2-a307-2ca48046d9ea\" />\r\n",
"pr_file_module": null
},
{
"comment_id": "2262634301",
"repo_full_name": "PostHog/posthog",
"pr_number": 36360,
"pr_file": "frontend/src/scenes/insights/filters/BreakdownFilter/BreakdownTagMenu.tsx",
"discussion_id": "2262597134",
"commented_code": "@@ -80,12 +89,93 @@ export const BreakdownTagMenu = (): JSX.Element => {\n onClick={() => {\n setHistogramBinsUsed(false)\n }}\n- active={!histogramBinsUsed}\n+ active={!histogramBinsUsed && !breakdownBins?.length}\n className=\"mt-2\"\n fullWidth\n >\n Do not bin numeric values\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ !breakdownBins?.length ? [{ low: null, high: null }] : []\n+ )\n+ }}\n+ active={!!breakdownBins?.length}\n+ className=\"mt-2\"\n+ fullWidth\n+ >\n+ Custom bins\n+ </LemonButton>\n+ {!!breakdownBins?.length && (\n+ <div className=\"p-2\">\n+ <div className=\"space-y-2\">\n+ {breakdownBins.map((bin: BreakdownBin, index: number) => (\n+ <div key={index} className=\"flex items-center gap-2\">\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.low ?? undefined}\n+ onChange={(lowNum) => {\n+ const low = lowNum !== undefined ? lowNum : null\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ breakdownBins.map((b: BreakdownBin, i: number) =>\n+ i === index ? { ...b, low } : b\n+ )\n+ )\n+ }}\n+ placeholder=\"Min\"\n+ />\n+ <span>-</span>\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.high ?? undefined}\n+ onChange={(highNum) => {",
"comment_created_at": "2025-08-08T10:57:21+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Cool, sorry saw the requested review so jumped in but I think it may have been an automated review request",
"pr_file_module": null
},
{
"comment_id": "2262932343",
"repo_full_name": "PostHog/posthog",
"pr_number": 36360,
"pr_file": "frontend/src/scenes/insights/filters/BreakdownFilter/BreakdownTagMenu.tsx",
"discussion_id": "2262597134",
"commented_code": "@@ -80,12 +89,93 @@ export const BreakdownTagMenu = (): JSX.Element => {\n onClick={() => {\n setHistogramBinsUsed(false)\n }}\n- active={!histogramBinsUsed}\n+ active={!histogramBinsUsed && !breakdownBins?.length}\n className=\"mt-2\"\n fullWidth\n >\n Do not bin numeric values\n </LemonButton>\n+ <LemonButton\n+ onClick={() => {\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ !breakdownBins?.length ? [{ low: null, high: null }] : []\n+ )\n+ }}\n+ active={!!breakdownBins?.length}\n+ className=\"mt-2\"\n+ fullWidth\n+ >\n+ Custom bins\n+ </LemonButton>\n+ {!!breakdownBins?.length && (\n+ <div className=\"p-2\">\n+ <div className=\"space-y-2\">\n+ {breakdownBins.map((bin: BreakdownBin, index: number) => (\n+ <div key={index} className=\"flex items-center gap-2\">\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.low ?? undefined}\n+ onChange={(lowNum) => {\n+ const low = lowNum !== undefined ? lowNum : null\n+ setBreakdownBins(\n+ breakdown,\n+ breakdownType,\n+ breakdownBins.map((b: BreakdownBin, i: number) =>\n+ i === index ? { ...b, low } : b\n+ )\n+ )\n+ }}\n+ placeholder=\"Min\"\n+ />\n+ <span>-</span>\n+ <LemonInput\n+ type=\"number\"\n+ className=\"w-24\"\n+ value={bin.high ?? undefined}\n+ onChange={(highNum) => {",
"comment_created_at": "2025-08-08T13:09:39+00:00",
"comment_author": "anirudhpillai",
"comment_body": "no worries at all, this is very helpful!\r\nthe reason I mention the save button is because of a discussion we were having at the hotel, will brief you. Mainly we have a long pending item 'insight edit mode' to add a run button to insight query so that it doesn't re query on every config change, but till then thought it makes sense to add a save button here",
"pr_file_module": null
}
]
},
{
"discussion_id": "2254270985",
"pr_number": 36200,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"created_at": "2025-08-05T12:58:57+00:00",
"commented_code": "const { thankYouMessageDescriptionContentType = null } = survey.appearance ?? {}\n useMountedLogic(actionsModel)\n \n+ // Load feature flag details if linked_flag_id exists but linked_flag doesn't\n+ useEffect(() => {\n+ if (survey.linked_flag_id && !survey.linked_flag) {\n+ api.featureFlags\n+ .get(survey.linked_flag_id)\n+ .then((flag) => {\n+ setSurveyValue('linked_flag', flag)\n+ })\n+ .catch(() => {\n+ // If flag doesn't exist anymore, clear the linked_flag_id\n+ setSurveyValue('linked_flag_id', null)\n+ })\n+ }\n+ }, [survey.linked_flag_id, survey.linked_flag, setSurveyValue])",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2254270985",
"repo_full_name": "PostHog/posthog",
"pr_number": 36200,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"discussion_id": "2254270985",
"commented_code": "@@ -249,6 +271,21 @@ export default function SurveyEdit(): JSX.Element {\n const { thankYouMessageDescriptionContentType = null } = survey.appearance ?? {}\n useMountedLogic(actionsModel)\n \n+ // Load feature flag details if linked_flag_id exists but linked_flag doesn't\n+ useEffect(() => {\n+ if (survey.linked_flag_id && !survey.linked_flag) {\n+ api.featureFlags\n+ .get(survey.linked_flag_id)\n+ .then((flag) => {\n+ setSurveyValue('linked_flag', flag)\n+ })\n+ .catch(() => {\n+ // If flag doesn't exist anymore, clear the linked_flag_id\n+ setSurveyValue('linked_flag_id', null)\n+ })\n+ }\n+ }, [survey.linked_flag_id, survey.linked_flag, setSurveyValue])",
"comment_created_at": "2025-08-05T12:58:57+00:00",
"comment_author": "marandaneto",
"comment_body": "is there a better way to do this?\r\nwhen flags are loaded and selected in the dropdown (only for new surveys), the `linked_flag` and its fields arent loaded yet.",
"pr_file_module": null
},
{
"comment_id": "2260403865",
"repo_full_name": "PostHog/posthog",
"pr_number": 36200,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"discussion_id": "2254270985",
"commented_code": "@@ -249,6 +271,21 @@ export default function SurveyEdit(): JSX.Element {\n const { thankYouMessageDescriptionContentType = null } = survey.appearance ?? {}\n useMountedLogic(actionsModel)\n \n+ // Load feature flag details if linked_flag_id exists but linked_flag doesn't\n+ useEffect(() => {\n+ if (survey.linked_flag_id && !survey.linked_flag) {\n+ api.featureFlags\n+ .get(survey.linked_flag_id)\n+ .then((flag) => {\n+ setSurveyValue('linked_flag', flag)\n+ })\n+ .catch(() => {\n+ // If flag doesn't exist anymore, clear the linked_flag_id\n+ setSurveyValue('linked_flag_id', null)\n+ })\n+ }\n+ }, [survey.linked_flag_id, survey.linked_flag, setSurveyValue])",
"comment_created_at": "2025-08-07T13:50:51+00:00",
"comment_author": "lucasheriques",
"comment_body": "instead of putting it on a `useEffect`, add it on the `onChange` handler for the flag id selector, as that's when it's needed\r\n\r\nor, you can use a [loader](https://keajs.org/docs/plugins/loaders) instead, and store all values in the kea logic\r\n\r\nso you'd do something like:\r\n\r\n```ts\r\nloaders(({ props, actions, values }) => ({\r\n ...otherLoaders,\r\n linkedFeatureFlag: {\r\n loadFeatureFlag: async () => {\r\n if (!values.survey?.linked_flag_id) {\r\n return undefined\r\n }\r\n\r\n return await api.featureFlags.get(values.survey.linked_flag_id)\r\n },\r\n },\r\n}))\r\n```",
"pr_file_module": null
},
{
"comment_id": "2260587259",
"repo_full_name": "PostHog/posthog",
"pr_number": 36200,
"pr_file": "frontend/src/scenes/surveys/SurveyEdit.tsx",
"discussion_id": "2254270985",
"commented_code": "@@ -249,6 +271,21 @@ export default function SurveyEdit(): JSX.Element {\n const { thankYouMessageDescriptionContentType = null } = survey.appearance ?? {}\n useMountedLogic(actionsModel)\n \n+ // Load feature flag details if linked_flag_id exists but linked_flag doesn't\n+ useEffect(() => {\n+ if (survey.linked_flag_id && !survey.linked_flag) {\n+ api.featureFlags\n+ .get(survey.linked_flag_id)\n+ .then((flag) => {\n+ setSurveyValue('linked_flag', flag)\n+ })\n+ .catch(() => {\n+ // If flag doesn't exist anymore, clear the linked_flag_id\n+ setSurveyValue('linked_flag_id', null)\n+ })\n+ }\n+ }, [survey.linked_flag_id, survey.linked_flag, setSurveyValue])",
"comment_created_at": "2025-08-07T14:55:49+00:00",
"comment_author": "marandaneto",
"comment_body": "done thanks",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,49 @@
---
title: minimize expensive operations
description: Avoid triggering expensive operations (queries, API calls, computations)
on every user input or state change. Instead, use appropriate triggers that balance
responsiveness with performance.
repository: PostHog/posthog
label: Performance Optimization
language: TSX
comments_count: 2
repository_stars: 28460
---
Avoid triggering expensive operations (queries, API calls, computations) on every user input or state change. Instead, use appropriate triggers that balance responsiveness with performance.
Key strategies:
- Use `onBlur` instead of `onChange` for expensive operations that don't need immediate feedback
- Implement save/submit buttons for complex forms to batch expensive operations
- Move data fetching from `useEffect` to event handlers when the data is only needed in response to specific user actions
- Consider the cost-benefit ratio: "breakdown queries are expensive as is" - defer expensive operations until truly necessary
Example from the discussions:
```tsx
// Instead of triggering expensive queries on every change:
<LemonInput
onChange={(value) => expensiveQuery(value)} // Causes UI jumpiness
/>
// Use onBlur or save buttons:
<LemonInput
onBlur={(value) => expensiveQuery(value)} // Better performance
/>
// Or move from useEffect to event handlers:
// Instead of:
useEffect(() => {
if (needsData) {
fetchExpensiveData()
}
}, [dependency])
// Use:
const handleUserAction = () => {
if (needsData) {
fetchExpensiveData() // Only when actually needed
}
}
```
This approach reduces unnecessary resource utilization and prevents performance bottlenecks that degrade user experience.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,45 @@
---
title: optimize data loading
description: Review data loading operations to ensure they are properly scoped, filtered,
and batched to prevent performance issues. Large datasets should be handled with
appropriate pagination, filtering by relevant criteria (like date ranges), and avoiding
operations that could load excessive amounts of data into memory.
repository: PostHog/posthog
label: Performance Optimization
language: TypeScript
comments_count: 3
repository_stars: 28460
---
Review data loading operations to ensure they are properly scoped, filtered, and batched to prevent performance issues. Large datasets should be handled with appropriate pagination, filtering by relevant criteria (like date ranges), and avoiding operations that could load excessive amounts of data into memory.
Key areas to check:
- Avoid spread operators with large arrays that could cause memory issues
- Ensure pagination limits don't truncate important data - consider if limits like "last 200 jobs" could miss records in high-volume scenarios
- Apply proper filtering before loading data rather than loading everything and filtering later
- Question whether all requested data is actually needed for the use case
Example of problematic pattern:
```typescript
// Loads ALL jobs for ALL sources without filtering
const allJobs = await Promise.all(
dataSources.map(async (source) => {
// This could return 283,914 items for large teams
return await api.externalDataSources.jobs(source.id, null, null)
})
)
```
Better approach:
```typescript
// Apply filtering and reasonable limits upfront
const recentJobs = await Promise.all(
dataSources.map(async (source) => {
return await api.externalDataSources.jobs(
source.id,
cutoffDate, // Filter by date
REASONABLE_LIMIT // Appropriate batch size
)
})
)
```

View File

@@ -0,0 +1,46 @@
[
{
"discussion_id": "2284651280",
"pr_number": 36529,
"pr_file": "plugin-server/src/worker/ingestion/persons/repositories/postgres-person-repository.ts",
"created_at": "2025-08-19T09:16:58+00:00",
"commented_code": "try {\n const { rows } = await this.postgres.query<RawPerson>(\n tx ?? PostgresUse.PERSONS_WRITE,\n- `WITH inserted_person AS (\n+ `${\n+ forcedId\n+ ? `WITH inserted_person AS (\n+ INSERT INTO posthog_person (\n+ id, created_at, properties, properties_last_updated_at,\n+ properties_last_operation, team_id, is_user_id, is_identified, uuid, version\n+ )\n+ VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)\n+ RETURNING *\n+ )`\n+ : `WITH inserted_person AS (\n INSERT INTO posthog_person (\n created_at, properties, properties_last_updated_at,\n properties_last_operation, team_id, is_user_id, is_identified, uuid, version\n )\n VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)\n RETURNING *\n- )` +\n+ )`\n+ } ` +\n distinctIds\n .map(\n // NOTE: Keep this in sync with the posthog_persondistinctid INSERT in\n // `addDistinctId`\n (_, index) => `, distinct_id_${index} AS (\n INSERT INTO posthog_persondistinctid (distinct_id, person_id, team_id, version)\n VALUES (\n- $${11 + index + distinctIds!.length - 1},\n+ $${(forcedId ? 12 : 11) + index + distinctIds!.length - 1},",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2284651280",
"repo_full_name": "PostHog/posthog",
"pr_number": 36529,
"pr_file": "plugin-server/src/worker/ingestion/persons/repositories/postgres-person-repository.ts",
"discussion_id": "2284651280",
"commented_code": "@@ -272,54 +277,80 @@ export class PostgresPersonRepository\n try {\n const { rows } = await this.postgres.query<RawPerson>(\n tx ?? PostgresUse.PERSONS_WRITE,\n- `WITH inserted_person AS (\n+ `${\n+ forcedId\n+ ? `WITH inserted_person AS (\n+ INSERT INTO posthog_person (\n+ id, created_at, properties, properties_last_updated_at,\n+ properties_last_operation, team_id, is_user_id, is_identified, uuid, version\n+ )\n+ VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)\n+ RETURNING *\n+ )`\n+ : `WITH inserted_person AS (\n INSERT INTO posthog_person (\n created_at, properties, properties_last_updated_at,\n properties_last_operation, team_id, is_user_id, is_identified, uuid, version\n )\n VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)\n RETURNING *\n- )` +\n+ )`\n+ } ` +\n distinctIds\n .map(\n // NOTE: Keep this in sync with the posthog_persondistinctid INSERT in\n // `addDistinctId`\n (_, index) => `, distinct_id_${index} AS (\n INSERT INTO posthog_persondistinctid (distinct_id, person_id, team_id, version)\n VALUES (\n- $${11 + index + distinctIds!.length - 1},\n+ $${(forcedId ? 12 : 11) + index + distinctIds!.length - 1},",
"comment_created_at": "2025-08-19T09:16:58+00:00",
"comment_author": "pl",
"comment_body": "question: Instead of adding forced ID conditionally, can we always add it and set the value in the query to null? I'd think that when passed null, the ID will be assigned by the autoincrement.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2272899689",
"pr_number": 36474,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"created_at": "2025-08-13T10:36:21+00:00",
"commented_code": "return {\n ...values.dataWarehouseSources,\n results:\n- values.dataWarehouseSources?.results.map((s) =>\n+ values.dataWarehouseSources?.results.map((s: ExternalDataSource) =>\n s.id === updatedSource.id ? updatedSource : s\n ) || [],\n }\n },\n },\n ],\n+ totalRowsProcessed: [\n+ 0 as number,\n+ {\n+ loadTotalRowsProcessed: async ({ materializedViews }: { materializedViews: any[] }) => {\n+ const dataSources = values.dataWarehouseSources?.results || []\n+\n+ const monthStartISO = getMonthStartISO()\n+\n+ const [schemaResults, materializationResults] = await Promise.all([\n+ Promise.all(\n+ dataSources.map(async (source: ExternalDataSource) => {\n+ try {\n+ const jobs = await api.externalDataSources.jobs(source.id, monthStartISO, null)\n+ return sumMTDRows(jobs, monthStartISO)\n+ } catch (error) {\n+ posthog.captureException(error)\n+ return 0\n+ }\n+ })\n+ ),\n+\n+ Promise.all(\n+ materializedViews.map(async (view: any) => {\n+ try {\n+ const res = await api.dataWarehouseSavedQueries.dataWarehouseDataModelingJobs.list(\n+ view.id,\n+ DATA_WAREHOUSE_CONFIG.maxJobsForMTD,\n+ 0\n+ )\n+ return sumMTDRows(res.results || [], monthStartISO)\n+ } catch (error) {\n+ posthog.captureException(error)\n+ return 0\n+ }\n+ })\n+ ),\n+ ])",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2272899689",
"repo_full_name": "PostHog/posthog",
"pr_number": 36474,
"pr_file": "frontend/src/scenes/data-warehouse/externalDataSourcesLogic.ts",
"discussion_id": "2272899689",
"commented_code": "@@ -35,13 +76,143 @@ export const externalDataSourcesLogic = kea<externalDataSourcesLogicType>([\n return {\n ...values.dataWarehouseSources,\n results:\n- values.dataWarehouseSources?.results.map((s) =>\n+ values.dataWarehouseSources?.results.map((s: ExternalDataSource) =>\n s.id === updatedSource.id ? updatedSource : s\n ) || [],\n }\n },\n },\n ],\n+ totalRowsProcessed: [\n+ 0 as number,\n+ {\n+ loadTotalRowsProcessed: async ({ materializedViews }: { materializedViews: any[] }) => {\n+ const dataSources = values.dataWarehouseSources?.results || []\n+\n+ const monthStartISO = getMonthStartISO()\n+\n+ const [schemaResults, materializationResults] = await Promise.all([\n+ Promise.all(\n+ dataSources.map(async (source: ExternalDataSource) => {\n+ try {\n+ const jobs = await api.externalDataSources.jobs(source.id, monthStartISO, null)\n+ return sumMTDRows(jobs, monthStartISO)\n+ } catch (error) {\n+ posthog.captureException(error)\n+ return 0\n+ }\n+ })\n+ ),\n+\n+ Promise.all(\n+ materializedViews.map(async (view: any) => {\n+ try {\n+ const res = await api.dataWarehouseSavedQueries.dataWarehouseDataModelingJobs.list(\n+ view.id,\n+ DATA_WAREHOUSE_CONFIG.maxJobsForMTD,\n+ 0\n+ )\n+ return sumMTDRows(res.results || [], monthStartISO)\n+ } catch (error) {\n+ posthog.captureException(error)\n+ return 0\n+ }\n+ })\n+ ),\n+ ])",
"comment_created_at": "2025-08-13T10:36:21+00:00",
"comment_author": "Gilbert09",
"comment_body": "This will execute 85 requests for team 2 (62 mat views + 23 sources), right? This will be too much - I wouldn't shy away from making an endpoint for loading data for this page if we have a bunch of custom requirements here - we can do both of these with a single DB query each. ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,43 @@
---
title: Optimize database query patterns
description: Avoid N+1 query problems and overly complex conditional SQL construction.
When loading related data, prefer batch operations or dedicated endpoints that can
fetch all required data in fewer queries. For conditional SQL parameters, consider
using null values with database defaults instead of constructing different query
strings.
repository: PostHog/posthog
label: Database
language: TypeScript
comments_count: 2
repository_stars: 28460
---
Avoid N+1 query problems and overly complex conditional SQL construction. When loading related data, prefer batch operations or dedicated endpoints that can fetch all required data in fewer queries. For conditional SQL parameters, consider using null values with database defaults instead of constructing different query strings.
Example of the problem:
```typescript
// Avoid: Multiple individual requests (N+1 pattern)
const results = await Promise.all(
dataSources.map(async (source) => {
const jobs = await api.externalDataSources.jobs(source.id, monthStartISO, null)
return sumMTDRows(jobs, monthStartISO)
})
)
// Avoid: Complex conditional SQL construction
const query = forcedId
? `INSERT INTO posthog_person (id, created_at, ...) VALUES ($1, $2, ...)`
: `INSERT INTO posthog_person (created_at, ...) VALUES ($1, $2, ...)`
```
Better approaches:
```typescript
// Prefer: Single batch request
const allJobsData = await api.externalDataSources.batchJobs(dataSourceIds, monthStartISO)
// Prefer: Always include parameter, use null for auto-increment
const query = `INSERT INTO posthog_person (id, created_at, ...) VALUES ($1, $2, ...)`
// Pass null for id when auto-increment is desired
```
This reduces database load, improves performance, and simplifies code maintenance by eliminating conditional query construction.

View File

@@ -0,0 +1,126 @@
[
{
"discussion_id": "2270897306",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"created_at": "2025-08-12T19:07:19+00:00",
"commented_code": "survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):\n+ \"\"\"Toolkit for feature flag lookup operations.\"\"\"\n+\n+ def __init__(self, team: Team):\n+ super().__init__(team)\n+ self._last_lookup_result: FeatureFlagLookupResult | None = None\n+\n+ def get_tools(self) -> list:\n+ \"\"\"Get all tools (default + custom). Override in subclasses to add custom tools.\"\"\"\n+ return self._get_custom_tools()\n+\n+ def _get_custom_tools(self) -> list:\n+ \"\"\"Get custom tools for feature flag lookup.\"\"\"\n+\n+ class lookup_feature_flag(BaseModel):\n+ \"\"\"\n+ Use this tool to lookup a feature flag by its key/name to get detailed information including ID and variants.\n+ Returns a message with the flag ID and the variants if the flag is found and the variants are available.\n+ \"\"\"\n+\n+ flag_key: str = Field(description=\"The key/name of the feature flag to look up\")\n+\n+ class final_answer(base_final_answer[SurveyCreationSchema]):\n+ __doc__ = base_final_answer.__doc__\n+\n+ return [lookup_feature_flag, final_answer]\n+\n+ def handle_tools(self, tool_name: str, tool_input) -> tuple[str, str]:\n+ \"\"\"Handle custom tool execution.\"\"\"\n+ if tool_name == \"lookup_feature_flag\":\n+ result = self._lookup_feature_flag(tool_input.arguments.flag_key)\n+ return tool_name, result\n+ return super().handle_tools(tool_name, tool_input)\n+\n+ def _lookup_feature_flag(self, flag_key: str) -> str:\n+ \"\"\"Look up feature flag information by key.\"\"\"\n+ try:\n+ # Look up the feature flag by key for the current team\n+ feature_flag = FeatureFlag.objects.select_related(\"team\").get(key=flag_key, team_id=self._team.id)",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2270897306",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"discussion_id": "2270897306",
"commented_code": "@@ -167,3 +151,133 @@ def _prepare_survey_data(self, survey_schema: SurveyCreationSchema, team: Team)\n survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):\n+ \"\"\"Toolkit for feature flag lookup operations.\"\"\"\n+\n+ def __init__(self, team: Team):\n+ super().__init__(team)\n+ self._last_lookup_result: FeatureFlagLookupResult | None = None\n+\n+ def get_tools(self) -> list:\n+ \"\"\"Get all tools (default + custom). Override in subclasses to add custom tools.\"\"\"\n+ return self._get_custom_tools()\n+\n+ def _get_custom_tools(self) -> list:\n+ \"\"\"Get custom tools for feature flag lookup.\"\"\"\n+\n+ class lookup_feature_flag(BaseModel):\n+ \"\"\"\n+ Use this tool to lookup a feature flag by its key/name to get detailed information including ID and variants.\n+ Returns a message with the flag ID and the variants if the flag is found and the variants are available.\n+ \"\"\"\n+\n+ flag_key: str = Field(description=\"The key/name of the feature flag to look up\")\n+\n+ class final_answer(base_final_answer[SurveyCreationSchema]):\n+ __doc__ = base_final_answer.__doc__\n+\n+ return [lookup_feature_flag, final_answer]\n+\n+ def handle_tools(self, tool_name: str, tool_input) -> tuple[str, str]:\n+ \"\"\"Handle custom tool execution.\"\"\"\n+ if tool_name == \"lookup_feature_flag\":\n+ result = self._lookup_feature_flag(tool_input.arguments.flag_key)\n+ return tool_name, result\n+ return super().handle_tools(tool_name, tool_input)\n+\n+ def _lookup_feature_flag(self, flag_key: str) -> str:\n+ \"\"\"Look up feature flag information by key.\"\"\"\n+ try:\n+ # Look up the feature flag by key for the current team\n+ feature_flag = FeatureFlag.objects.select_related(\"team\").get(key=flag_key, team_id=self._team.id)",
"comment_created_at": "2025-08-12T19:07:19+00:00",
"comment_author": "lucasheriques",
"comment_body": "```suggestion\r\n feature_flag = FeatureFlag.objects.get(key=flag_key, team_id=self._team.id)\r\n```\r\n\r\n`select_related(\"team\")` is unnecessary here since we're filtering by `team_id` and don't use the team object",
"pr_file_module": null
}
]
},
{
"discussion_id": "2251849409",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"created_at": "2025-08-04T15:32:55+00:00",
"commented_code": "\"insight__team\",\n \"insight__short_id\",\n \"insight__query\",\n+ \"insight__filters\",\n )\n .order_by(\"insight_id\", \"-last_viewed_at\")\n .distinct(\"insight_id\")\n )\n \n- self._all_insights = list(\n+ def _get_total_insights_count(self) -> int:\n+ if self._total_insights_count is None:\n+ self._total_insights_count = self._get_insights_queryset().count()\n+ return self._total_insights_count\n+\n+ def _load_insights_page(self, page_number: int) -> list[dict]:",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2251849409",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2251849409",
"commented_code": "@@ -135,33 +142,52 @@ def _load_all_insights(self) -> None:\n \"insight__team\",\n \"insight__short_id\",\n \"insight__query\",\n+ \"insight__filters\",\n )\n .order_by(\"insight_id\", \"-last_viewed_at\")\n .distinct(\"insight_id\")\n )\n \n- self._all_insights = list(\n+ def _get_total_insights_count(self) -> int:\n+ if self._total_insights_count is None:\n+ self._total_insights_count = self._get_insights_queryset().count()\n+ return self._total_insights_count\n+\n+ def _load_insights_page(self, page_number: int) -> list[dict]:",
"comment_created_at": "2025-08-04T15:32:55+00:00",
"comment_author": "kappa90",
"comment_body": "You're loading `InsightViewed` then extracting fields from the related `Insight` model as a dict, and then you reconstruct the insight queries and filters later in other parts of the class. This sounds convoluted. You should use the `Insight` model directly, ar the viewed one would be a sunset. Then you can directly keep the loaded insights in `loaded_pages`, and avoid the formatting down the line.",
"pr_file_module": null
},
{
"comment_id": "2251873141",
"repo_full_name": "PostHog/posthog",
"pr_number": 35726,
"pr_file": "ee/hogai/graph/insights/nodes.py",
"discussion_id": "2251849409",
"commented_code": "@@ -135,33 +142,52 @@ def _load_all_insights(self) -> None:\n \"insight__team\",\n \"insight__short_id\",\n \"insight__query\",\n+ \"insight__filters\",\n )\n .order_by(\"insight_id\", \"-last_viewed_at\")\n .distinct(\"insight_id\")\n )\n \n- self._all_insights = list(\n+ def _get_total_insights_count(self) -> int:\n+ if self._total_insights_count is None:\n+ self._total_insights_count = self._get_insights_queryset().count()\n+ return self._total_insights_count\n+\n+ def _load_insights_page(self, page_number: int) -> list[dict]:",
"comment_created_at": "2025-08-04T15:44:04+00:00",
"comment_author": "kappa90",
"comment_body": "Also, this is a blocking thread, use async queries.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2250936978",
"pr_number": 36014,
"pr_file": "dags/web_preaggregated_team_selection_strategies.py",
"created_at": "2025-08-04T09:31:44+00:00",
"commented_code": "+import os\n+from abc import ABC, abstractmethod\n+\n+import dagster\n+from posthog.clickhouse.client import sync_execute\n+from posthog.models.web_preaggregated.team_selection import (\n+ DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT,\n+ get_top_teams_by_median_pageviews_sql,\n+)\n+\n+\n+class TeamSelectionStrategy(ABC):\n+ \"\"\"Abstract base class for team selection strategies.\"\"\"\n+\n+ @abstractmethod\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ \"\"\"Get teams using this strategy.\"\"\"\n+ pass\n+\n+ @abstractmethod\n+ def get_name(self) -> str:\n+ \"\"\"Get the strategy name.\"\"\"\n+ pass\n+\n+\n+class EnvironmentVariableStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams from environment variable configuration.\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"environment_variable\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ env_teams = os.getenv(\"WEB_ANALYTICS_ENABLED_TEAM_IDS\")\n+ if not env_teams:\n+ context.log.info(\"No teams found in WEB_ANALYTICS_ENABLED_TEAM_IDS environment variable\")\n+ return set()\n+\n+ team_ids = set()\n+ invalid_ids = []\n+\n+ for tid in env_teams.split(\",\"):\n+ tid = tid.strip()\n+ if tid:\n+ try:\n+ team_ids.add(int(tid))\n+ except ValueError:\n+ invalid_ids.append(tid)\n+\n+ if invalid_ids:\n+ context.log.warning(f\"Invalid team IDs in environment variable: {invalid_ids}\")\n+\n+ context.log.info(f\"Found {len(team_ids)} valid teams from environment variable\")\n+ return team_ids\n+\n+\n+class HighPageviewsStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams with the highest pageview counts (default: 30).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"high_pageviews\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ try:\n+ limit = int(os.getenv(\"WEB_ANALYTICS_TOP_TEAMS_LIMIT\", str(DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT)))\n+ sql = get_top_teams_by_median_pageviews_sql(limit)\n+ result = sync_execute(sql)\n+ team_ids = {row[0] for row in result}\n+ context.log.info(f\"Found {len(team_ids)} teams with high pageviews\")\n+ return team_ids\n+ except ValueError as e:\n+ context.log.exception(f\"Invalid configuration for pageviews query: {e}\")\n+ return set()\n+ except Exception as e:\n+ context.log.warning(f\"Failed to fetch top teams by pageviews: {e}\")\n+ return set()\n+\n+\n+class FeatureEnrollmentStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams where users have enrolled in a specific feature preview (default: web-analytics-api).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"feature_enrollment\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ flag_key = os.getenv(\"WEB_ANALYTICS_FEATURE_FLAG_KEY\", \"web-analytics-api\")\n+\n+ try:\n+ from posthog.models.person.person import Person\n+\n+ # Query PostgreSQL for teams with enrolled users\n+ enrollment_key = f\"$feature_enrollment/{flag_key}\"\n+ team_ids = (\n+ Person.objects.filter(**{f\"properties__{enrollment_key}\": True})",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2250936978",
"repo_full_name": "PostHog/posthog",
"pr_number": 36014,
"pr_file": "dags/web_preaggregated_team_selection_strategies.py",
"discussion_id": "2250936978",
"commented_code": "@@ -0,0 +1,131 @@\n+import os\n+from abc import ABC, abstractmethod\n+\n+import dagster\n+from posthog.clickhouse.client import sync_execute\n+from posthog.models.web_preaggregated.team_selection import (\n+ DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT,\n+ get_top_teams_by_median_pageviews_sql,\n+)\n+\n+\n+class TeamSelectionStrategy(ABC):\n+ \"\"\"Abstract base class for team selection strategies.\"\"\"\n+\n+ @abstractmethod\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ \"\"\"Get teams using this strategy.\"\"\"\n+ pass\n+\n+ @abstractmethod\n+ def get_name(self) -> str:\n+ \"\"\"Get the strategy name.\"\"\"\n+ pass\n+\n+\n+class EnvironmentVariableStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams from environment variable configuration.\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"environment_variable\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ env_teams = os.getenv(\"WEB_ANALYTICS_ENABLED_TEAM_IDS\")\n+ if not env_teams:\n+ context.log.info(\"No teams found in WEB_ANALYTICS_ENABLED_TEAM_IDS environment variable\")\n+ return set()\n+\n+ team_ids = set()\n+ invalid_ids = []\n+\n+ for tid in env_teams.split(\",\"):\n+ tid = tid.strip()\n+ if tid:\n+ try:\n+ team_ids.add(int(tid))\n+ except ValueError:\n+ invalid_ids.append(tid)\n+\n+ if invalid_ids:\n+ context.log.warning(f\"Invalid team IDs in environment variable: {invalid_ids}\")\n+\n+ context.log.info(f\"Found {len(team_ids)} valid teams from environment variable\")\n+ return team_ids\n+\n+\n+class HighPageviewsStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams with the highest pageview counts (default: 30).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"high_pageviews\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ try:\n+ limit = int(os.getenv(\"WEB_ANALYTICS_TOP_TEAMS_LIMIT\", str(DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT)))\n+ sql = get_top_teams_by_median_pageviews_sql(limit)\n+ result = sync_execute(sql)\n+ team_ids = {row[0] for row in result}\n+ context.log.info(f\"Found {len(team_ids)} teams with high pageviews\")\n+ return team_ids\n+ except ValueError as e:\n+ context.log.exception(f\"Invalid configuration for pageviews query: {e}\")\n+ return set()\n+ except Exception as e:\n+ context.log.warning(f\"Failed to fetch top teams by pageviews: {e}\")\n+ return set()\n+\n+\n+class FeatureEnrollmentStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams where users have enrolled in a specific feature preview (default: web-analytics-api).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"feature_enrollment\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ flag_key = os.getenv(\"WEB_ANALYTICS_FEATURE_FLAG_KEY\", \"web-analytics-api\")\n+\n+ try:\n+ from posthog.models.person.person import Person\n+\n+ # Query PostgreSQL for teams with enrolled users\n+ enrollment_key = f\"$feature_enrollment/{flag_key}\"\n+ team_ids = (\n+ Person.objects.filter(**{f\"properties__{enrollment_key}\": True})",
"comment_created_at": "2025-08-04T09:31:44+00:00",
"comment_author": "joshsny",
"comment_body": "I think this will fail in prod - it'll scan the entire persons table in postgres (which I've done before, and it'd just timeout). Maybe it's fine as it's in a dag \ud83e\udd37\r\n\r\nThis probs needs to hit CH instead, see here for an example query: https://github.com/PostHog/posthog/blob/9d5516ce483705fc0912e45b2bb409361396236c/posthog/tasks/early_access_feature.py#L34",
"pr_file_module": null
},
{
"comment_id": "2250938616",
"repo_full_name": "PostHog/posthog",
"pr_number": 36014,
"pr_file": "dags/web_preaggregated_team_selection_strategies.py",
"discussion_id": "2250936978",
"commented_code": "@@ -0,0 +1,131 @@\n+import os\n+from abc import ABC, abstractmethod\n+\n+import dagster\n+from posthog.clickhouse.client import sync_execute\n+from posthog.models.web_preaggregated.team_selection import (\n+ DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT,\n+ get_top_teams_by_median_pageviews_sql,\n+)\n+\n+\n+class TeamSelectionStrategy(ABC):\n+ \"\"\"Abstract base class for team selection strategies.\"\"\"\n+\n+ @abstractmethod\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ \"\"\"Get teams using this strategy.\"\"\"\n+ pass\n+\n+ @abstractmethod\n+ def get_name(self) -> str:\n+ \"\"\"Get the strategy name.\"\"\"\n+ pass\n+\n+\n+class EnvironmentVariableStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams from environment variable configuration.\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"environment_variable\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ env_teams = os.getenv(\"WEB_ANALYTICS_ENABLED_TEAM_IDS\")\n+ if not env_teams:\n+ context.log.info(\"No teams found in WEB_ANALYTICS_ENABLED_TEAM_IDS environment variable\")\n+ return set()\n+\n+ team_ids = set()\n+ invalid_ids = []\n+\n+ for tid in env_teams.split(\",\"):\n+ tid = tid.strip()\n+ if tid:\n+ try:\n+ team_ids.add(int(tid))\n+ except ValueError:\n+ invalid_ids.append(tid)\n+\n+ if invalid_ids:\n+ context.log.warning(f\"Invalid team IDs in environment variable: {invalid_ids}\")\n+\n+ context.log.info(f\"Found {len(team_ids)} valid teams from environment variable\")\n+ return team_ids\n+\n+\n+class HighPageviewsStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams with the highest pageview counts (default: 30).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"high_pageviews\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ try:\n+ limit = int(os.getenv(\"WEB_ANALYTICS_TOP_TEAMS_LIMIT\", str(DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT)))\n+ sql = get_top_teams_by_median_pageviews_sql(limit)\n+ result = sync_execute(sql)\n+ team_ids = {row[0] for row in result}\n+ context.log.info(f\"Found {len(team_ids)} teams with high pageviews\")\n+ return team_ids\n+ except ValueError as e:\n+ context.log.exception(f\"Invalid configuration for pageviews query: {e}\")\n+ return set()\n+ except Exception as e:\n+ context.log.warning(f\"Failed to fetch top teams by pageviews: {e}\")\n+ return set()\n+\n+\n+class FeatureEnrollmentStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams where users have enrolled in a specific feature preview (default: web-analytics-api).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"feature_enrollment\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ flag_key = os.getenv(\"WEB_ANALYTICS_FEATURE_FLAG_KEY\", \"web-analytics-api\")\n+\n+ try:\n+ from posthog.models.person.person import Person\n+\n+ # Query PostgreSQL for teams with enrolled users\n+ enrollment_key = f\"$feature_enrollment/{flag_key}\"\n+ team_ids = (\n+ Person.objects.filter(**{f\"properties__{enrollment_key}\": True})",
"comment_created_at": "2025-08-04T09:32:27+00:00",
"comment_author": "joshsny",
"comment_body": "it would be really nice if we had a way for teams to opt into previews for all their users \ud83e\udd14",
"pr_file_module": null
},
{
"comment_id": "2250956446",
"repo_full_name": "PostHog/posthog",
"pr_number": 36014,
"pr_file": "dags/web_preaggregated_team_selection_strategies.py",
"discussion_id": "2250936978",
"commented_code": "@@ -0,0 +1,131 @@\n+import os\n+from abc import ABC, abstractmethod\n+\n+import dagster\n+from posthog.clickhouse.client import sync_execute\n+from posthog.models.web_preaggregated.team_selection import (\n+ DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT,\n+ get_top_teams_by_median_pageviews_sql,\n+)\n+\n+\n+class TeamSelectionStrategy(ABC):\n+ \"\"\"Abstract base class for team selection strategies.\"\"\"\n+\n+ @abstractmethod\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ \"\"\"Get teams using this strategy.\"\"\"\n+ pass\n+\n+ @abstractmethod\n+ def get_name(self) -> str:\n+ \"\"\"Get the strategy name.\"\"\"\n+ pass\n+\n+\n+class EnvironmentVariableStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams from environment variable configuration.\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"environment_variable\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ env_teams = os.getenv(\"WEB_ANALYTICS_ENABLED_TEAM_IDS\")\n+ if not env_teams:\n+ context.log.info(\"No teams found in WEB_ANALYTICS_ENABLED_TEAM_IDS environment variable\")\n+ return set()\n+\n+ team_ids = set()\n+ invalid_ids = []\n+\n+ for tid in env_teams.split(\",\"):\n+ tid = tid.strip()\n+ if tid:\n+ try:\n+ team_ids.add(int(tid))\n+ except ValueError:\n+ invalid_ids.append(tid)\n+\n+ if invalid_ids:\n+ context.log.warning(f\"Invalid team IDs in environment variable: {invalid_ids}\")\n+\n+ context.log.info(f\"Found {len(team_ids)} valid teams from environment variable\")\n+ return team_ids\n+\n+\n+class HighPageviewsStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams with the highest pageview counts (default: 30).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"high_pageviews\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ try:\n+ limit = int(os.getenv(\"WEB_ANALYTICS_TOP_TEAMS_LIMIT\", str(DEFAULT_TOP_TEAMS_BY_PAGEVIEWS_LIMIT)))\n+ sql = get_top_teams_by_median_pageviews_sql(limit)\n+ result = sync_execute(sql)\n+ team_ids = {row[0] for row in result}\n+ context.log.info(f\"Found {len(team_ids)} teams with high pageviews\")\n+ return team_ids\n+ except ValueError as e:\n+ context.log.exception(f\"Invalid configuration for pageviews query: {e}\")\n+ return set()\n+ except Exception as e:\n+ context.log.warning(f\"Failed to fetch top teams by pageviews: {e}\")\n+ return set()\n+\n+\n+class FeatureEnrollmentStrategy(TeamSelectionStrategy):\n+ \"\"\"Select teams where users have enrolled in a specific feature preview (default: web-analytics-api).\"\"\"\n+\n+ def get_name(self) -> str:\n+ return \"feature_enrollment\"\n+\n+ def get_teams(self, context: dagster.OpExecutionContext) -> set[int]:\n+ flag_key = os.getenv(\"WEB_ANALYTICS_FEATURE_FLAG_KEY\", \"web-analytics-api\")\n+\n+ try:\n+ from posthog.models.person.person import Person\n+\n+ # Query PostgreSQL for teams with enrolled users\n+ enrollment_key = f\"$feature_enrollment/{flag_key}\"\n+ team_ids = (\n+ Person.objects.filter(**{f\"properties__{enrollment_key}\": True})",
"comment_created_at": "2025-08-04T09:39:51+00:00",
"comment_author": "lricoy",
"comment_body": "Oh, that makes sense. I saw that query and considered it, but I thought this would be more direct as it wouldn't require an exposure event. I will change it!",
"pr_file_module": null
}
]
},
{
"discussion_id": "2247307893",
"pr_number": 35980,
"pr_file": "ee/clickhouse/views/groups.py",
"created_at": "2025-08-01T08:24:38+00:00",
"commented_code": "fields = [\"group_type_index\", \"group_key\", \"group_properties\", \"created_at\"]\n \n \n+class FindGroupSerializer(GroupSerializer):\n+ notebook = serializers.SerializerMethodField()\n+\n+ class Meta:\n+ model = Group\n+ fields = [*GroupSerializer.Meta.fields, \"notebook\"]\n+\n+ def get_notebook(self, obj: Group) -> str | None:\n+ relationship = obj.notebook_relationships.first()\n+ return relationship.notebook.short_id if relationship else None",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2247307893",
"repo_full_name": "PostHog/posthog",
"pr_number": 35980,
"pr_file": "ee/clickhouse/views/groups.py",
"discussion_id": "2247307893",
"commented_code": "@@ -105,6 +108,18 @@ class Meta:\n fields = [\"group_type_index\", \"group_key\", \"group_properties\", \"created_at\"]\n \n \n+class FindGroupSerializer(GroupSerializer):\n+ notebook = serializers.SerializerMethodField()\n+\n+ class Meta:\n+ model = Group\n+ fields = [*GroupSerializer.Meta.fields, \"notebook\"]\n+\n+ def get_notebook(self, obj: Group) -> str | None:\n+ relationship = obj.notebook_relationships.first()\n+ return relationship.notebook.short_id if relationship else None",
"comment_created_at": "2025-08-01T08:24:38+00:00",
"comment_author": "daibhin",
"comment_body": "Can we avoid the N+1 here by doing something like `prefetch_related(\"notebook_relationships__notebook\")` when finding the groups?",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,38 @@
---
title: optimize ORM queries
description: Optimize Django ORM queries to prevent performance issues and unnecessary
database load. Avoid N+1 query problems by using appropriate prefetch_related()
and select_related() calls. Remove unnecessary select_related() when you're only
filtering by foreign key IDs and not accessing the related object. Choose the correct
database routing (read vs write...
repository: PostHog/posthog
label: Database
language: Python
comments_count: 4
repository_stars: 28460
---
Optimize Django ORM queries to prevent performance issues and unnecessary database load. Avoid N+1 query problems by using appropriate prefetch_related() and select_related() calls. Remove unnecessary select_related() when you're only filtering by foreign key IDs and not accessing the related object. Choose the correct database routing (read vs write replicas) based on the operation. For large tables, prefer database-level filtering over Python-level processing to avoid scanning entire tables.
Example of unnecessary select_related():
```python
# Bad - unnecessary select_related when only filtering by team_id
feature_flag = FeatureFlag.objects.select_related("team").get(key=flag_key, team_id=self._team.id)
# Good - remove select_related when not using the related object
feature_flag = FeatureFlag.objects.get(key=flag_key, team_id=self._team.id)
```
Example of preventing N+1 queries:
```python
# Bad - causes N+1 queries
for obj in Group.objects.all():
relationship = obj.notebook_relationships.first()
# Good - use prefetch_related to avoid N+1
groups = Group.objects.prefetch_related("notebook_relationships__notebook")
for obj in groups:
relationship = obj.notebook_relationships.first()
```
For large datasets, use database-level operations instead of scanning entire tables in application code, especially in production environments where tables can be very large.

View File

@@ -0,0 +1,138 @@
[
{
"discussion_id": "2276284218",
"pr_number": 36586,
"pr_file": "posthog/warehouse/api/external_data_source.py",
"created_at": "2025-08-14T10:53:42+00:00",
"commented_code": "status=status.HTTP_200_OK,\n data={str(key): value.model_dump() for key, value in configs.items()},\n )\n+\n+ @action(methods=[\"GET\"], detail=False)\n+ def dwh_scene_stats(self, request: Request, *arg: Any, **kwargs: Any):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2276284218",
"repo_full_name": "PostHog/posthog",
"pr_number": 36586,
"pr_file": "posthog/warehouse/api/external_data_source.py",
"discussion_id": "2276284218",
"commented_code": "@@ -610,3 +613,63 @@ def wizard(self, request: Request, *arg: Any, **kwargs: Any):\n status=status.HTTP_200_OK,\n data={str(key): value.model_dump() for key, value in configs.items()},\n )\n+\n+ @action(methods=[\"GET\"], detail=False)\n+ def dwh_scene_stats(self, request: Request, *arg: Any, **kwargs: Any):",
"comment_created_at": "2025-08-14T10:53:42+00:00",
"comment_author": "Gilbert09",
"comment_body": "Nit: can we not include this in the `external_data_source` API viewset, please? Ideally, our viewsets should be RESTful in the sense that they relate to \"objects\" - we're trying to fudge this aggregation endpoint in a place that doesn't make much sense. I'd opt for a new api endpoint like`/api/data_warehouse`",
"pr_file_module": null
},
{
"comment_id": "2276746510",
"repo_full_name": "PostHog/posthog",
"pr_number": 36586,
"pr_file": "posthog/warehouse/api/external_data_source.py",
"discussion_id": "2276284218",
"commented_code": "@@ -610,3 +613,63 @@ def wizard(self, request: Request, *arg: Any, **kwargs: Any):\n status=status.HTTP_200_OK,\n data={str(key): value.model_dump() for key, value in configs.items()},\n )\n+\n+ @action(methods=[\"GET\"], detail=False)\n+ def dwh_scene_stats(self, request: Request, *arg: Any, **kwargs: Any):",
"comment_created_at": "2025-08-14T14:07:49+00:00",
"comment_author": "naumaanh",
"comment_body": "yes! i can move this into a new file. that makes more sense as most likely multiple other endpoints will need to be registered as well!",
"pr_file_module": null
}
]
},
{
"discussion_id": "2276290936",
"pr_number": 36586,
"pr_file": "posthog/warehouse/api/external_data_source.py",
"created_at": "2025-08-14T10:56:50+00:00",
"commented_code": "status=status.HTTP_200_OK,\n data={str(key): value.model_dump() for key, value in configs.items()},\n )\n+\n+ @action(methods=[\"GET\"], detail=False)\n+ def dwh_scene_stats(self, request: Request, *arg: Any, **kwargs: Any):\n+ \"\"\"\n+ Returns aggregated statistics for the data warehouse scene including total rows processed.\n+ Used by the frontend data warehouse scene to display usage information.\n+ \"\"\"\n+ billing_interval = \"\"\n+ rows_synced = 0\n+ data_modeling_rows = 0\n+\n+ try:\n+ billing_manager = BillingManager(get_cached_instance_license())\n+ org_billing = billing_manager.get_billing(organization=self.team.organization)\n+\n+ if org_billing and org_billing.get(\"billing_period\"):\n+ billing_period = org_billing[\"billing_period\"]\n+ billing_period_start = parser.parse(billing_period[\"current_period_start\"])\n+ billing_period_end = parser.parse(billing_period[\"current_period_end\"])\n+ billing_interval = billing_period.get(\"interval\", \"month\")\n+\n+ usage_summary = org_billing.get(\"usage_summary\", {})\n+ billing_tracked_rows = usage_summary.get(\"rows_synced\", {}).get(\"usage\", 0)\n+\n+ all_external_jobs = ExternalDataJob.objects.filter(\n+ team_id=self.team_id,\n+ created_at__gte=billing_period_start,\n+ created_at__lt=billing_period_end,\n+ billable=True,\n+ )\n+ total_db_rows = all_external_jobs.aggregate(total=Sum(\"rows_synced\"))[\"total\"] or 0\n+\n+ pending_billing_rows = max(0, total_db_rows - billing_tracked_rows)\n+\n+ rows_synced = billing_tracked_rows + pending_billing_rows\n+\n+ data_modeling_jobs = DataModelingJob.objects.filter(\n+ team_id=self.team_id,\n+ created_at__gte=billing_period_start,\n+ created_at__lt=billing_period_end,\n+ )\n+ data_modeling_rows = data_modeling_jobs.aggregate(total=Sum(\"rows_materialized\"))[\"total\"] or 0\n+\n+ except Exception as e:\n+ logger.exception(\"Could not retrieve billing information\", exc_info=e)\n+\n+ return Response(\n+ status=status.HTTP_200_OK,\n+ data={\n+ \"billingInterval\": billing_interval,\n+ \"externalData\": {\n+ \"billingPeriodEnd\": billing_period_end,\n+ \"billingPeriodStart\": billing_period_start,\n+ \"dataModelingRows\": data_modeling_rows,\n+ \"trackedBillingRows\": billing_tracked_rows,\n+ \"pendingBillingRows\": pending_billing_rows,\n+ \"totalRows\": rows_synced,\n+ },\n+ },",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2276290936",
"repo_full_name": "PostHog/posthog",
"pr_number": 36586,
"pr_file": "posthog/warehouse/api/external_data_source.py",
"discussion_id": "2276290936",
"commented_code": "@@ -610,3 +613,63 @@ def wizard(self, request: Request, *arg: Any, **kwargs: Any):\n status=status.HTTP_200_OK,\n data={str(key): value.model_dump() for key, value in configs.items()},\n )\n+\n+ @action(methods=[\"GET\"], detail=False)\n+ def dwh_scene_stats(self, request: Request, *arg: Any, **kwargs: Any):\n+ \"\"\"\n+ Returns aggregated statistics for the data warehouse scene including total rows processed.\n+ Used by the frontend data warehouse scene to display usage information.\n+ \"\"\"\n+ billing_interval = \"\"\n+ rows_synced = 0\n+ data_modeling_rows = 0\n+\n+ try:\n+ billing_manager = BillingManager(get_cached_instance_license())\n+ org_billing = billing_manager.get_billing(organization=self.team.organization)\n+\n+ if org_billing and org_billing.get(\"billing_period\"):\n+ billing_period = org_billing[\"billing_period\"]\n+ billing_period_start = parser.parse(billing_period[\"current_period_start\"])\n+ billing_period_end = parser.parse(billing_period[\"current_period_end\"])\n+ billing_interval = billing_period.get(\"interval\", \"month\")\n+\n+ usage_summary = org_billing.get(\"usage_summary\", {})\n+ billing_tracked_rows = usage_summary.get(\"rows_synced\", {}).get(\"usage\", 0)\n+\n+ all_external_jobs = ExternalDataJob.objects.filter(\n+ team_id=self.team_id,\n+ created_at__gte=billing_period_start,\n+ created_at__lt=billing_period_end,\n+ billable=True,\n+ )\n+ total_db_rows = all_external_jobs.aggregate(total=Sum(\"rows_synced\"))[\"total\"] or 0\n+\n+ pending_billing_rows = max(0, total_db_rows - billing_tracked_rows)\n+\n+ rows_synced = billing_tracked_rows + pending_billing_rows\n+\n+ data_modeling_jobs = DataModelingJob.objects.filter(\n+ team_id=self.team_id,\n+ created_at__gte=billing_period_start,\n+ created_at__lt=billing_period_end,\n+ )\n+ data_modeling_rows = data_modeling_jobs.aggregate(total=Sum(\"rows_materialized\"))[\"total\"] or 0\n+\n+ except Exception as e:\n+ logger.exception(\"Could not retrieve billing information\", exc_info=e)\n+\n+ return Response(\n+ status=status.HTTP_200_OK,\n+ data={\n+ \"billingInterval\": billing_interval,\n+ \"externalData\": {\n+ \"billingPeriodEnd\": billing_period_end,\n+ \"billingPeriodStart\": billing_period_start,\n+ \"dataModelingRows\": data_modeling_rows,\n+ \"trackedBillingRows\": billing_tracked_rows,\n+ \"pendingBillingRows\": pending_billing_rows,\n+ \"totalRows\": rows_synced,\n+ },\n+ },",
"comment_created_at": "2025-08-14T10:56:50+00:00",
"comment_author": "Gilbert09",
"comment_body": "This response doesn't conform to what you're expecting in the frontend:\r\n```typescipt\r\nPromise<{\r\n billingInterval: string\r\n billingPeriodEnd: string\r\n billingPeriodStart: string\r\n dataModelingRows: number\r\n externalData: {\r\n billingTrackedRows: number\r\n pendingBillingRows: number\r\n totalRows: number\r\n }\r\n}>\r\n```",
"pr_file_module": null
}
]
},
{
"discussion_id": "2191021252",
"pr_number": 34580,
"pr_file": "posthog/session_recordings/session_recording_api.py",
"created_at": "2025-07-07T21:13:51+00:00",
"commented_code": "recording.deleted = True\n recording.save()\n \n+ # Also need to remove from playlist items if it's in one\n+ SessionRecordingPlaylistItem.objects.filter(playlist__team=self.team, recording=recording).update(deleted=True)\n+\n return Response({\"success\": True}, status=204)\n \n+ @extend_schema(exclude=True)\n+ def delete(self, request: request.Request, *args: Any, **kwargs: Any) -> Response:\n+ \"\"\"\n+ Bulk soft delete all recordings matching the provided filters.\n+ Always run asynchronously via Celery.\n+ \"\"\"\n+ user_distinct_id = cast(User, request.user).distinct_id\n+\n+ try:\n+ query = filter_from_params_to_query(request.GET.dict())",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2191021252",
"repo_full_name": "PostHog/posthog",
"pr_number": 34580,
"pr_file": "posthog/session_recordings/session_recording_api.py",
"discussion_id": "2191021252",
"commented_code": "@@ -652,8 +656,98 @@ def destroy(self, request: request.Request, *args: Any, **kwargs: Any) -> Respon\n recording.deleted = True\n recording.save()\n \n+ # Also need to remove from playlist items if it's in one\n+ SessionRecordingPlaylistItem.objects.filter(playlist__team=self.team, recording=recording).update(deleted=True)\n+\n return Response({\"success\": True}, status=204)\n \n+ @extend_schema(exclude=True)\n+ def delete(self, request: request.Request, *args: Any, **kwargs: Any) -> Response:\n+ \"\"\"\n+ Bulk soft delete all recordings matching the provided filters.\n+ Always run asynchronously via Celery.\n+ \"\"\"\n+ user_distinct_id = cast(User, request.user).distinct_id\n+\n+ try:\n+ query = filter_from_params_to_query(request.GET.dict())",
"comment_created_at": "2025-07-07T21:13:51+00:00",
"comment_author": "pauldambra",
"comment_body": "the query results could change between us displaying them in the browser and the async task running\r\n\r\nis that what people would want?\r\n\r\nshould we be sending a set of session ids?\r\nthere's a maximum size to a payload so that gives us an upper limit on how many can be requested for delete in one go.\r\n\r\nit makes me wonder if there's a real need for this UI / query step here - once we add it we have to support it and it's one more feature in the UI that is there for a very, very small proportion of vists\r\n\r\nif we only had a bulk delete API and folk could script against it, what would change here?\r\n\r\ni can't think of a time when that wouldn't have solved the support case... we can even have an example script that we can share with people to show what it would look like\r\n\r\ndeleting many recordings after all should be an exception\r\n\r\n",
"pr_file_module": null
}
]
},
{
"discussion_id": "2204239516",
"pr_number": 33948,
"pr_file": "posthog/api/survey.py",
"created_at": "2025-07-14T08:43:03+00:00",
"commented_code": "),\n )\n \n+ # If survey_id is provided, return individual survey\n+ if survey_id:\n+ try:\n+ survey = Survey.objects.select_related(\"linked_flag\", \"targeting_flag\", \"internal_targeting_flag\").get(\n+ id=survey_id, team=team\n+ )\n+ except Survey.DoesNotExist:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"Survey not found.\",\n+ type=\"not_found\",\n+ code=\"survey_not_found\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Check if survey is archived\n+ if survey.archived:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"This survey is no longer available.\",\n+ type=\"not_found\",\n+ code=\"survey_archived\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Return individual survey response\n+ serialized_survey = SurveyAPISerializer(survey).data\n+ response_data = {\n+ \"survey\": serialized_survey,\n+ \"project_config\": {\n+ \"api_host\": request.build_absolute_uri(\"/\").rstrip(\"/\"),\n+ \"token\": team.api_token,\n+ },\n+ }\n+ return cors_response(request, JsonResponse(response_data))\n+\n+ # Return all surveys (existing behavior)\n return cors_response(request, JsonResponse(get_surveys_response(team)))\n \n \n+# Constants for better maintainability\n+logger = structlog.get_logger(__name__)\n+SURVEY_ID_MAX_LENGTH = 50\n+CACHE_TIMEOUT_SECONDS = 300\n+\n+\n+def is_valid_uuid(uuid_string: str) -> bool:\n+ \"\"\"Validate if a string is a valid UUID format.\"\"\"\n+ try:\n+ uuid.UUID(uuid_string)\n+ return True\n+ except (ValueError, TypeError):\n+ return False\n+\n+\n+@csrf_exempt\n+@axes_dispatch\n+def public_survey_page(request, survey_id: str):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2204239516",
"repo_full_name": "PostHog/posthog",
"pr_number": 33948,
"pr_file": "posthog/api/survey.py",
"discussion_id": "2204239516",
"commented_code": "@@ -1386,9 +1394,178 @@ def surveys(request: Request):\n ),\n )\n \n+ # If survey_id is provided, return individual survey\n+ if survey_id:\n+ try:\n+ survey = Survey.objects.select_related(\"linked_flag\", \"targeting_flag\", \"internal_targeting_flag\").get(\n+ id=survey_id, team=team\n+ )\n+ except Survey.DoesNotExist:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"Survey not found.\",\n+ type=\"not_found\",\n+ code=\"survey_not_found\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Check if survey is archived\n+ if survey.archived:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"This survey is no longer available.\",\n+ type=\"not_found\",\n+ code=\"survey_archived\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Return individual survey response\n+ serialized_survey = SurveyAPISerializer(survey).data\n+ response_data = {\n+ \"survey\": serialized_survey,\n+ \"project_config\": {\n+ \"api_host\": request.build_absolute_uri(\"/\").rstrip(\"/\"),\n+ \"token\": team.api_token,\n+ },\n+ }\n+ return cors_response(request, JsonResponse(response_data))\n+\n+ # Return all surveys (existing behavior)\n return cors_response(request, JsonResponse(get_surveys_response(team)))\n \n \n+# Constants for better maintainability\n+logger = structlog.get_logger(__name__)\n+SURVEY_ID_MAX_LENGTH = 50\n+CACHE_TIMEOUT_SECONDS = 300\n+\n+\n+def is_valid_uuid(uuid_string: str) -> bool:\n+ \"\"\"Validate if a string is a valid UUID format.\"\"\"\n+ try:\n+ uuid.UUID(uuid_string)\n+ return True\n+ except (ValueError, TypeError):\n+ return False\n+\n+\n+@csrf_exempt\n+@axes_dispatch\n+def public_survey_page(request, survey_id: str):",
"comment_created_at": "2025-07-14T08:43:03+00:00",
"comment_author": "marandaneto",
"comment_body": "how will we know who's answering the survey? is the link only for anonymous users?",
"pr_file_module": null
},
{
"comment_id": "2206145507",
"repo_full_name": "PostHog/posthog",
"pr_number": 33948,
"pr_file": "posthog/api/survey.py",
"discussion_id": "2204239516",
"commented_code": "@@ -1386,9 +1394,178 @@ def surveys(request: Request):\n ),\n )\n \n+ # If survey_id is provided, return individual survey\n+ if survey_id:\n+ try:\n+ survey = Survey.objects.select_related(\"linked_flag\", \"targeting_flag\", \"internal_targeting_flag\").get(\n+ id=survey_id, team=team\n+ )\n+ except Survey.DoesNotExist:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"Survey not found.\",\n+ type=\"not_found\",\n+ code=\"survey_not_found\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Check if survey is archived\n+ if survey.archived:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"This survey is no longer available.\",\n+ type=\"not_found\",\n+ code=\"survey_archived\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Return individual survey response\n+ serialized_survey = SurveyAPISerializer(survey).data\n+ response_data = {\n+ \"survey\": serialized_survey,\n+ \"project_config\": {\n+ \"api_host\": request.build_absolute_uri(\"/\").rstrip(\"/\"),\n+ \"token\": team.api_token,\n+ },\n+ }\n+ return cors_response(request, JsonResponse(response_data))\n+\n+ # Return all surveys (existing behavior)\n return cors_response(request, JsonResponse(get_surveys_response(team)))\n \n \n+# Constants for better maintainability\n+logger = structlog.get_logger(__name__)\n+SURVEY_ID_MAX_LENGTH = 50\n+CACHE_TIMEOUT_SECONDS = 300\n+\n+\n+def is_valid_uuid(uuid_string: str) -> bool:\n+ \"\"\"Validate if a string is a valid UUID format.\"\"\"\n+ try:\n+ uuid.UUID(uuid_string)\n+ return True\n+ except (ValueError, TypeError):\n+ return False\n+\n+\n+@csrf_exempt\n+@axes_dispatch\n+def public_survey_page(request, survey_id: str):",
"comment_created_at": "2025-07-15T02:25:02+00:00",
"comment_author": "lucasheriques",
"comment_body": "for now we don't have a way to identifying users. I think we should merge this code as is, at least regarding identification, and I'll work on it as a fast follow regardless",
"pr_file_module": null
},
{
"comment_id": "2207419458",
"repo_full_name": "PostHog/posthog",
"pr_number": 33948,
"pr_file": "posthog/api/survey.py",
"discussion_id": "2204239516",
"commented_code": "@@ -1386,9 +1394,178 @@ def surveys(request: Request):\n ),\n )\n \n+ # If survey_id is provided, return individual survey\n+ if survey_id:\n+ try:\n+ survey = Survey.objects.select_related(\"linked_flag\", \"targeting_flag\", \"internal_targeting_flag\").get(\n+ id=survey_id, team=team\n+ )\n+ except Survey.DoesNotExist:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"Survey not found.\",\n+ type=\"not_found\",\n+ code=\"survey_not_found\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Check if survey is archived\n+ if survey.archived:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"This survey is no longer available.\",\n+ type=\"not_found\",\n+ code=\"survey_archived\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Return individual survey response\n+ serialized_survey = SurveyAPISerializer(survey).data\n+ response_data = {\n+ \"survey\": serialized_survey,\n+ \"project_config\": {\n+ \"api_host\": request.build_absolute_uri(\"/\").rstrip(\"/\"),\n+ \"token\": team.api_token,\n+ },\n+ }\n+ return cors_response(request, JsonResponse(response_data))\n+\n+ # Return all surveys (existing behavior)\n return cors_response(request, JsonResponse(get_surveys_response(team)))\n \n \n+# Constants for better maintainability\n+logger = structlog.get_logger(__name__)\n+SURVEY_ID_MAX_LENGTH = 50\n+CACHE_TIMEOUT_SECONDS = 300\n+\n+\n+def is_valid_uuid(uuid_string: str) -> bool:\n+ \"\"\"Validate if a string is a valid UUID format.\"\"\"\n+ try:\n+ uuid.UUID(uuid_string)\n+ return True\n+ except (ValueError, TypeError):\n+ return False\n+\n+\n+@csrf_exempt\n+@axes_dispatch\n+def public_survey_page(request, survey_id: str):",
"comment_created_at": "2025-07-15T13:02:40+00:00",
"comment_author": "marandaneto",
"comment_body": "is the current API and approach extendable for identified users in the near feature? or will we need to break links?",
"pr_file_module": null
},
{
"comment_id": "2208621256",
"repo_full_name": "PostHog/posthog",
"pr_number": 33948,
"pr_file": "posthog/api/survey.py",
"discussion_id": "2204239516",
"commented_code": "@@ -1386,9 +1394,178 @@ def surveys(request: Request):\n ),\n )\n \n+ # If survey_id is provided, return individual survey\n+ if survey_id:\n+ try:\n+ survey = Survey.objects.select_related(\"linked_flag\", \"targeting_flag\", \"internal_targeting_flag\").get(\n+ id=survey_id, team=team\n+ )\n+ except Survey.DoesNotExist:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"Survey not found.\",\n+ type=\"not_found\",\n+ code=\"survey_not_found\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Check if survey is archived\n+ if survey.archived:\n+ return cors_response(\n+ request,\n+ generate_exception_response(\n+ \"surveys\",\n+ \"This survey is no longer available.\",\n+ type=\"not_found\",\n+ code=\"survey_archived\",\n+ status_code=status.HTTP_404_NOT_FOUND,\n+ ),\n+ )\n+\n+ # Return individual survey response\n+ serialized_survey = SurveyAPISerializer(survey).data\n+ response_data = {\n+ \"survey\": serialized_survey,\n+ \"project_config\": {\n+ \"api_host\": request.build_absolute_uri(\"/\").rstrip(\"/\"),\n+ \"token\": team.api_token,\n+ },\n+ }\n+ return cors_response(request, JsonResponse(response_data))\n+\n+ # Return all surveys (existing behavior)\n return cors_response(request, JsonResponse(get_surveys_response(team)))\n \n \n+# Constants for better maintainability\n+logger = structlog.get_logger(__name__)\n+SURVEY_ID_MAX_LENGTH = 50\n+CACHE_TIMEOUT_SECONDS = 300\n+\n+\n+def is_valid_uuid(uuid_string: str) -> bool:\n+ \"\"\"Validate if a string is a valid UUID format.\"\"\"\n+ try:\n+ uuid.UUID(uuid_string)\n+ return True\n+ except (ValueError, TypeError):\n+ return False\n+\n+\n+@csrf_exempt\n+@axes_dispatch\n+def public_survey_page(request, survey_id: str):",
"comment_created_at": "2025-07-15T20:38:31+00:00",
"comment_author": "lucasheriques",
"comment_body": "i just tested it, for now it's a very simple thing: just add a query parameter, `distinct_id`, [and use that for identifying users](https://github.com/PostHog/posthog/pull/33948/files#diff-fdea57bef4ea1f195c8313118abe5b886b35d55386ec5d1d57abee11d1c9a5a2R903-R906). just tested and it worked fine \ud83d\ude4f \r\n\r\nthough in the future we might extend this by instead asking the user to provide something like their email directly.\r\n\r\nNo need for breaking links, as url will forever be `external_surveys`",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,53 @@
---
title: RESTful endpoint organization
description: API endpoints should be properly organized around resources and follow
RESTful principles. Avoid placing aggregation or utility endpoints in viewsets where
they don't belong conceptually.
repository: PostHog/posthog
label: API
language: Python
comments_count: 4
repository_stars: 28460
---
API endpoints should be properly organized around resources and follow RESTful principles. Avoid placing aggregation or utility endpoints in viewsets where they don't belong conceptually.
Each viewset should represent a specific resource type, and endpoints within that viewset should operate on that resource. When you need aggregation endpoints or cross-resource operations, create dedicated API endpoints rather than forcing them into existing viewsets.
Example of what to avoid:
```python
# DON'T: Adding aggregation endpoint to external_data_source viewset
class ExternalDataSourceViewSet(ModelViewSet):
@action(methods=["GET"], detail=False)
def dwh_scene_stats(self, request): # This doesn't belong here
# Returns aggregated data warehouse statistics
pass
```
Example of proper organization:
```python
# DO: Create dedicated endpoint for aggregations
class DataWarehouseViewSet(ModelViewSet):
@action(methods=["GET"], detail=False)
def scene_stats(self, request):
# Returns aggregated data warehouse statistics
pass
```
For individual vs collection resources, handle both cases properly:
```python
# Handle both individual survey and survey collection
def surveys(request: Request):
if survey_id:
# Return individual survey with proper error handling
try:
survey = Survey.objects.get(id=survey_id, team=team)
return JsonResponse({"survey": serialized_survey})
except Survey.DoesNotExist:
return JsonResponse({"error": "Survey not found"}, status=404)
# Return collection of surveys
return JsonResponse(get_surveys_response(team))
```
This approach makes APIs more intuitive, maintainable, and follows REST conventions that frontend developers expect.

View File

@@ -0,0 +1,138 @@
[
{
"discussion_id": "2281664198",
"pr_number": 36307,
"pr_file": "posthog/migrations/0822_alter_team_session_recording_retention_period.py",
"created_at": "2025-08-18T08:26:44+00:00",
"commented_code": null,
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2281664198",
"repo_full_name": "PostHog/posthog",
"pr_number": 36307,
"pr_file": "posthog/migrations/0822_alter_team_session_recording_retention_period.py",
"discussion_id": "2281664198",
"commented_code": null,
"comment_created_at": "2025-08-18T08:26:44+00:00",
"comment_author": "pauldambra",
"comment_body": "i'd collapse these two to one new migration instead of creating and altering in one go\r\ncould also just do that in a different PR and then all the `ambr` changes are out of the way for this PR\r\ni think this setting change is more than safe enough to just go in by itself",
"pr_file_module": null
},
{
"comment_id": "2281664925",
"repo_full_name": "PostHog/posthog",
"pr_number": 36307,
"pr_file": "posthog/migrations/0822_alter_team_session_recording_retention_period.py",
"discussion_id": "2281664198",
"commented_code": null,
"comment_created_at": "2025-08-18T08:27:04+00:00",
"comment_author": "pauldambra",
"comment_body": "we used to have a job that failed if more than one migration (i don't remember why \ud83d\ude48)",
"pr_file_module": null
}
]
},
{
"discussion_id": "2273123945",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"created_at": "2025-08-13T11:43:05+00:00",
"commented_code": "is_static = models.BooleanField(default=False)\n \n+ cohort_type = models.CharField(\n+ max_length=20,\n+ choices=COHORT_TYPE_CHOICES,\n+ null=True,\n+ blank=True,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2273123945",
"repo_full_name": "PostHog/posthog",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"discussion_id": "2273123945",
"commented_code": "@@ -164,6 +175,14 @@ class Cohort(FileSystemSyncMixin, RootTeamMixin, models.Model):\n \n is_static = models.BooleanField(default=False)\n \n+ cohort_type = models.CharField(\n+ max_length=20,\n+ choices=COHORT_TYPE_CHOICES,\n+ null=True,\n+ blank=True,",
"comment_created_at": "2025-08-13T11:43:05+00:00",
"comment_author": "meikelmosby",
"comment_body": "leaving this null bc we still need to migrate the types of existing cohorts or do we actually _want_ this to be null?",
"pr_file_module": null
},
{
"comment_id": "2274235279",
"repo_full_name": "PostHog/posthog",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"discussion_id": "2273123945",
"commented_code": "@@ -164,6 +175,14 @@ class Cohort(FileSystemSyncMixin, RootTeamMixin, models.Model):\n \n is_static = models.BooleanField(default=False)\n \n+ cohort_type = models.CharField(\n+ max_length=20,\n+ choices=COHORT_TYPE_CHOICES,\n+ null=True,\n+ blank=True,",
"comment_created_at": "2025-08-13T18:02:52+00:00",
"comment_author": "dmarticus",
"comment_body": "the former\u00a0\u2013 I want to do the migration safely, so my vision was that the migration that actually populates the data will be a separate PR. My proposed flow is\r\n\r\nCreate the nullable field -> migrate all existing cohorts based on their filters -> drop null constraint",
"pr_file_module": null
}
]
},
{
"discussion_id": "2273138394",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"created_at": "2025-08-13T11:47:58+00:00",
"commented_code": "return True\n return False\n \n+ def determine_cohort_type(self) -> str:\n+ \"\"\"Determine cohort type based on filters\"\"\"\n+ if self.is_static:\n+ return COHORT_TYPE_STATIC\n+\n+ # Analyze all properties to determine maximum complexity\n+ has_cohort_filters = False\n+ has_person_filters = False\n+ has_behavioral_filters = False\n+\n+ for prop in self.properties.flat:\n+ if prop.type == \"cohort\":\n+ has_cohort_filters = True\n+ elif prop.type == \"person\":\n+ has_person_filters = True\n+ elif prop.type == \"behavioral\":\n+ has_behavioral_filters = True\n+\n+ # Return the most complex type\n+ if self.has_analytical_filters:\n+ return COHORT_TYPE_ANALYTICAL\n+ elif has_behavioral_filters:\n+ return COHORT_TYPE_BEHAVIORAL\n+ elif has_person_filters or has_cohort_filters:\n+ return COHORT_TYPE_PERSON_PROPERTY\n+ else:\n+ return COHORT_TYPE_PERSON_PROPERTY # Default for empty cohorts\n+\n+ def can_be_used_in_feature_flag(self) -> bool:\n+ \"\"\"Determine if cohort can be used in feature flag targeting\"\"\"\n+ if self.is_static:\n+ return True\n+\n+ # Legacy check for backward compatibility\n+ if not self.cohort_type:\n+ # Fall back to determining type dynamically for unmigrated cohorts\n+ # This ensures consistent behavior for legacy cohorts\n+ determined_type = self.determine_cohort_type()\n+ return determined_type in [COHORT_TYPE_STATIC, COHORT_TYPE_PERSON_PROPERTY]",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2273138394",
"repo_full_name": "PostHog/posthog",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"discussion_id": "2273138394",
"commented_code": "@@ -269,6 +289,56 @@ def has_complex_behavioral_filter(self) -> bool:\n return True\n return False\n \n+ def determine_cohort_type(self) -> str:\n+ \"\"\"Determine cohort type based on filters\"\"\"\n+ if self.is_static:\n+ return COHORT_TYPE_STATIC\n+\n+ # Analyze all properties to determine maximum complexity\n+ has_cohort_filters = False\n+ has_person_filters = False\n+ has_behavioral_filters = False\n+\n+ for prop in self.properties.flat:\n+ if prop.type == \"cohort\":\n+ has_cohort_filters = True\n+ elif prop.type == \"person\":\n+ has_person_filters = True\n+ elif prop.type == \"behavioral\":\n+ has_behavioral_filters = True\n+\n+ # Return the most complex type\n+ if self.has_analytical_filters:\n+ return COHORT_TYPE_ANALYTICAL\n+ elif has_behavioral_filters:\n+ return COHORT_TYPE_BEHAVIORAL\n+ elif has_person_filters or has_cohort_filters:\n+ return COHORT_TYPE_PERSON_PROPERTY\n+ else:\n+ return COHORT_TYPE_PERSON_PROPERTY # Default for empty cohorts\n+\n+ def can_be_used_in_feature_flag(self) -> bool:\n+ \"\"\"Determine if cohort can be used in feature flag targeting\"\"\"\n+ if self.is_static:\n+ return True\n+\n+ # Legacy check for backward compatibility\n+ if not self.cohort_type:\n+ # Fall back to determining type dynamically for unmigrated cohorts\n+ # This ensures consistent behavior for legacy cohorts\n+ determined_type = self.determine_cohort_type()\n+ return determined_type in [COHORT_TYPE_STATIC, COHORT_TYPE_PERSON_PROPERTY]",
"comment_created_at": "2025-08-13T11:47:58+00:00",
"comment_author": "meikelmosby",
"comment_body": "i wish we would have split this PR differently .. as in \n\n1. introduce the cohort_type & create the new cohort_type for every new cohort created\n2. do the actual migration of migrating existing cohorts to the new types \n3. -- now we have state were every new cohort created has a type and every old cohort is migrated over -- \n4. introduce the rest of the logic etc (we would not need any of the backwards compatible query stuff to dynamically determine the cohort types.. )\n",
"pr_file_module": null
},
{
"comment_id": "2274236102",
"repo_full_name": "PostHog/posthog",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/cohort.py",
"discussion_id": "2273138394",
"commented_code": "@@ -269,6 +289,56 @@ def has_complex_behavioral_filter(self) -> bool:\n return True\n return False\n \n+ def determine_cohort_type(self) -> str:\n+ \"\"\"Determine cohort type based on filters\"\"\"\n+ if self.is_static:\n+ return COHORT_TYPE_STATIC\n+\n+ # Analyze all properties to determine maximum complexity\n+ has_cohort_filters = False\n+ has_person_filters = False\n+ has_behavioral_filters = False\n+\n+ for prop in self.properties.flat:\n+ if prop.type == \"cohort\":\n+ has_cohort_filters = True\n+ elif prop.type == \"person\":\n+ has_person_filters = True\n+ elif prop.type == \"behavioral\":\n+ has_behavioral_filters = True\n+\n+ # Return the most complex type\n+ if self.has_analytical_filters:\n+ return COHORT_TYPE_ANALYTICAL\n+ elif has_behavioral_filters:\n+ return COHORT_TYPE_BEHAVIORAL\n+ elif has_person_filters or has_cohort_filters:\n+ return COHORT_TYPE_PERSON_PROPERTY\n+ else:\n+ return COHORT_TYPE_PERSON_PROPERTY # Default for empty cohorts\n+\n+ def can_be_used_in_feature_flag(self) -> bool:\n+ \"\"\"Determine if cohort can be used in feature flag targeting\"\"\"\n+ if self.is_static:\n+ return True\n+\n+ # Legacy check for backward compatibility\n+ if not self.cohort_type:\n+ # Fall back to determining type dynamically for unmigrated cohorts\n+ # This ensures consistent behavior for legacy cohorts\n+ determined_type = self.determine_cohort_type()\n+ return determined_type in [COHORT_TYPE_STATIC, COHORT_TYPE_PERSON_PROPERTY]",
"comment_created_at": "2025-08-13T18:03:17+00:00",
"comment_author": "dmarticus",
"comment_body": "this is super fair and I'm gonna blow up this PR into 3 parts, as I mentioned here: https://posthog.slack.com/archives/C09494B1AN5/p1755092140443499?thread_ts=1755087666.737709&cid=C09494B1AN5",
"pr_file_module": null
}
]
},
{
"discussion_id": "2246534353",
"pr_number": 35980,
"pr_file": "posthog/migrations/0816_notebook_hidden.py",
"created_at": "2025-07-31T22:55:37+00:00",
"commented_code": "+# Generated by Django 4.2.22 on 2025-07-31 18:16\n+\n+from django.db import migrations, models",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2246534353",
"repo_full_name": "PostHog/posthog",
"pr_number": 35980,
"pr_file": "posthog/migrations/0816_notebook_hidden.py",
"discussion_id": "2246534353",
"commented_code": "@@ -0,0 +1,15 @@\n+# Generated by Django 4.2.22 on 2025-07-31 18:16\n+\n+from django.db import migrations, models",
"comment_created_at": "2025-07-31T22:55:37+00:00",
"comment_author": "zlwaterfield",
"comment_body": "Feedback on migrations, I don't like PRs with multiple migrations because if one succeeds and the other one fails, the deploy could be stopped and the db could get out of sync with the codebase. ",
"pr_file_module": null
},
{
"comment_id": "2247805819",
"repo_full_name": "PostHog/posthog",
"pr_number": 35980,
"pr_file": "posthog/migrations/0816_notebook_hidden.py",
"discussion_id": "2246534353",
"commented_code": "@@ -0,0 +1,15 @@\n+# Generated by Django 4.2.22 on 2025-07-31 18:16\n+\n+from django.db import migrations, models",
"comment_created_at": "2025-08-01T12:02:45+00:00",
"comment_author": "arthurdedeus",
"comment_body": "makes sense! will split this PR into 2",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,45 @@
---
title: Split complex migrations incrementally
description: Break complex schema changes into multiple, sequential migrations to
ensure deployment safety and proper data handling. Each migration should represent
a single, atomic change that can succeed or fail independently.
repository: PostHog/posthog
label: Migrations
language: Python
comments_count: 4
repository_stars: 28460
---
Break complex schema changes into multiple, sequential migrations to ensure deployment safety and proper data handling. Each migration should represent a single, atomic change that can succeed or fail independently.
**Why this matters:**
- Multiple migrations in one PR can cause deployment failures if one succeeds and another fails, leaving the database out of sync with the codebase
- Complex changes are safer when split into logical steps that can be rolled back individually
- Incremental approach allows for safer data migration and validation at each step
**Recommended approach:**
1. **Add nullable field** - Introduce new columns as nullable first
2. **Migrate data** - Populate the new field with appropriate values
3. **Add constraints** - Apply NOT NULL constraints or other restrictions after data is migrated
4. **Remove old fields** - Deprecate or remove old columns in separate migration
**Example:**
Instead of creating and altering in one migration:
```python
# Avoid: Complex migration doing multiple operations
operations = [
migrations.AddField(model_name="cohort", name="cohort_type", field=models.CharField(max_length=20, choices=CHOICES)),
migrations.AlterField(model_name="cohort", name="cohort_type", field=models.CharField(max_length=20, choices=CHOICES, null=False)),
]
```
Prefer incremental steps across separate PRs:
```python
# Step 1: Add nullable field
field=models.CharField(max_length=20, choices=CHOICES, null=True, blank=True)
# Step 2 (separate PR): Migrate existing data
# Step 3 (separate PR): Drop null constraint
```
**One migration per PR** - Keep each PR focused on a single migration to avoid deployment synchronization issues.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,41 @@
---
title: Test complex logic thoroughly
description: When implementing complex business logic, state management, or algorithms
with multiple edge cases, ensure comprehensive test coverage. Complex logic is prone
to bugs and difficult to reason about during code review, making tests essential
for maintainability and correctness.
repository: PostHog/posthog
label: Testing
language: TypeScript
comments_count: 3
repository_stars: 28460
---
When implementing complex business logic, state management, or algorithms with multiple edge cases, ensure comprehensive test coverage. Complex logic is prone to bugs and difficult to reason about during code review, making tests essential for maintainability and correctness.
Focus on:
- **Extract testable functions**: Break down complex logic into smaller, pure functions that can be easily unit tested
- **Cover edge cases**: Test boundary conditions, error scenarios, and potential infinite loops
- **Test state transitions**: For stateful logic, verify all possible state changes and their side effects
Example from state management code:
```typescript
// Instead of testing the entire complex state logic inline
initializeMessageStates: ({ inputCount, outputCount }) => {
// Complex state calculation logic here...
}
// Extract into testable utility functions
const calculateMessageStates = (inputCount: number, outputCount: number) => {
// Logic here - now easily testable
}
// Test the extracted function thoroughly
describe('calculateMessageStates', () => {
it('should handle edge case where counts exceed limits', () => {
// Test potential infinite loop scenario
})
})
```
This approach makes code review easier by allowing reviewers to focus on the test cases to understand the expected behavior, rather than trying to mentally trace through complex logic paths.

View File

@@ -0,0 +1,46 @@
[
{
"discussion_id": "2283829963",
"pr_number": 36692,
"pr_file": "posthog/warehouse/api/test/test_data_warehouse.py",
"created_at": "2025-08-19T01:29:38+00:00",
"commented_code": "self.assertEqual(response.status_code, 500)\n self.assertEqual(data[\"error\"], \"An error occurred retrieving billing information\")\n+\n+ def test_recent_activity_includes_external_jobs_and_modeling_jobs(self):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2283829963",
"repo_full_name": "PostHog/posthog",
"pr_number": 36692,
"pr_file": "posthog/warehouse/api/test/test_data_warehouse.py",
"discussion_id": "2283829963",
"commented_code": "@@ -57,3 +58,57 @@ def test_billing_exception_returns_500(self, mock_license, mock_billing_manager)\n \n self.assertEqual(response.status_code, 500)\n self.assertEqual(data[\"error\"], \"An error occurred retrieving billing information\")\n+\n+ def test_recent_activity_includes_external_jobs_and_modeling_jobs(self):",
"comment_created_at": "2025-08-19T01:29:38+00:00",
"comment_author": "EDsCODE",
"comment_body": "NIT: would be good to test some empty states (if either model is empty) and make sure pagination works (effectively validating the query logic is right)",
"pr_file_module": null
}
]
},
{
"discussion_id": "2241909890",
"pr_number": 35834,
"pr_file": "posthog/models/web_preaggregated/test_web_pre_aggregated_timezones.py",
"created_at": "2025-07-30T08:27:09+00:00",
"commented_code": "+from posthog.models import Team\n+from parameterized import parameterized\n+\n+from posthog.clickhouse.client.execute import sync_execute\n+from posthog.models.web_preaggregated.sql import (\n+ WEB_STATS_INSERT_SQL,\n+ WEB_BOUNCES_INSERT_SQL,\n+ WEB_STATS_DAILY_SQL,\n+ WEB_STATS_HOURLY_SQL,\n+ WEB_BOUNCES_DAILY_SQL,\n+ WEB_BOUNCES_HOURLY_SQL,\n+)\n+from posthog.models.utils import uuid7\n+from posthog.hogql_queries.web_analytics.test.web_preaggregated_test_base import WebAnalyticsPreAggregatedTestBase\n+from posthog.test.base import (\n+ _create_event,\n+ _create_person,\n+ flush_persons_and_events,\n+ snapshot_clickhouse_queries,\n+)\n+from posthog.hogql_queries.web_analytics.stats_table import WebStatsTableQueryRunner\n+from posthog.hogql_queries.web_analytics.test.test_web_stats_table import FloatAwareTestCase\n+from posthog.schema import (\n+ DateRange,\n+ WebStatsTableQuery,\n+ WebStatsBreakdown,\n+ HogQLQueryModifiers,\n+)\n+\n+\n+@snapshot_clickhouse_queries\n+class TestTimezonePreAggregatedIntegration(WebAnalyticsPreAggregatedTestBase, FloatAwareTestCase):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2241909890",
"repo_full_name": "PostHog/posthog",
"pr_number": 35834,
"pr_file": "posthog/models/web_preaggregated/test_web_pre_aggregated_timezones.py",
"discussion_id": "2241909890",
"commented_code": "@@ -0,0 +1,342 @@\n+from posthog.models import Team\n+from parameterized import parameterized\n+\n+from posthog.clickhouse.client.execute import sync_execute\n+from posthog.models.web_preaggregated.sql import (\n+ WEB_STATS_INSERT_SQL,\n+ WEB_BOUNCES_INSERT_SQL,\n+ WEB_STATS_DAILY_SQL,\n+ WEB_STATS_HOURLY_SQL,\n+ WEB_BOUNCES_DAILY_SQL,\n+ WEB_BOUNCES_HOURLY_SQL,\n+)\n+from posthog.models.utils import uuid7\n+from posthog.hogql_queries.web_analytics.test.web_preaggregated_test_base import WebAnalyticsPreAggregatedTestBase\n+from posthog.test.base import (\n+ _create_event,\n+ _create_person,\n+ flush_persons_and_events,\n+ snapshot_clickhouse_queries,\n+)\n+from posthog.hogql_queries.web_analytics.stats_table import WebStatsTableQueryRunner\n+from posthog.hogql_queries.web_analytics.test.test_web_stats_table import FloatAwareTestCase\n+from posthog.schema import (\n+ DateRange,\n+ WebStatsTableQuery,\n+ WebStatsBreakdown,\n+ HogQLQueryModifiers,\n+)\n+\n+\n+@snapshot_clickhouse_queries\n+class TestTimezonePreAggregatedIntegration(WebAnalyticsPreAggregatedTestBase, FloatAwareTestCase):",
"comment_created_at": "2025-07-30T08:27:09+00:00",
"comment_author": "robbie-c",
"comment_body": "It'd be good to have a test to see what happens if someone tries to query with a team timezone of e.g. India +5:30, even if it's to assert that something throws an error. ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,37 @@
---
title: Test edge cases comprehensively
description: Ensure your tests cover not just the happy path, but also edge cases,
empty states, error conditions, and boundary scenarios. This includes testing with
empty datasets, validating pagination logic, handling unusual input values, and
verifying error handling behavior.
repository: PostHog/posthog
label: Testing
language: Python
comments_count: 2
repository_stars: 28460
---
Ensure your tests cover not just the happy path, but also edge cases, empty states, error conditions, and boundary scenarios. This includes testing with empty datasets, validating pagination logic, handling unusual input values, and verifying error handling behavior.
When writing tests, systematically consider:
- Empty or null inputs (empty models, missing data)
- Boundary conditions (timezone edge cases like +5:30 offsets)
- Error scenarios and exception handling
- Pagination and query logic validation
Example from the discussions:
```python
def test_recent_activity_includes_external_jobs_and_modeling_jobs(self):
# Don't just test the happy path - also test:
# - Empty states (if either model is empty)
# - Pagination works correctly
# - Query logic handles edge cases properly
```
```python
def test_timezone_edge_cases(self):
# Test unusual timezone offsets like India +5:30
# Even if it's to assert that something throws an error
```
Comprehensive edge case testing catches bugs early, validates assumptions about system behavior, and ensures robust handling of real-world scenarios that may not be immediately obvious during development.

View File

@@ -0,0 +1,46 @@
[
{
"discussion_id": "2027343712",
"pr_number": 30764,
"pr_file": "posthog/settings/session_replay.py",
"created_at": "2025-04-03T16:13:54+00:00",
"commented_code": "# intended to allow testing of new releases of rrweb or our lazy loaded recording script\n SESSION_REPLAY_RRWEB_SCRIPT = get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT\", None, optional=True)\n \n-# a list of teams that are allowed to use the SESSION_REPLAY_RRWEB_SCRIPT\n-# can be a comma separated list of team ids or '*' to allow all teams\n-SESSION_REPLAY_RRWEB_SCRIPT_ALLOWED_TEAMS = get_list(get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT_ALLOWED_TEAMS\", \"\"))\n+# can be * for all teams or a number to limit to any team with an id less than the number\n+SESSION_REPLAY_RRWEB_SCRIPT_MAX_ALLOWED_TEAMS = get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT_MAX_ALLOWED_TEAMS\", \"-1\")",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2027343712",
"repo_full_name": "PostHog/posthog",
"pr_number": 30764,
"pr_file": "posthog/settings/session_replay.py",
"discussion_id": "2027343712",
"commented_code": "@@ -37,9 +37,8 @@\n # intended to allow testing of new releases of rrweb or our lazy loaded recording script\n SESSION_REPLAY_RRWEB_SCRIPT = get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT\", None, optional=True)\n \n-# a list of teams that are allowed to use the SESSION_REPLAY_RRWEB_SCRIPT\n-# can be a comma separated list of team ids or '*' to allow all teams\n-SESSION_REPLAY_RRWEB_SCRIPT_ALLOWED_TEAMS = get_list(get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT_ALLOWED_TEAMS\", \"\"))\n+# can be * for all teams or a number to limit to any team with an id less than the number\n+SESSION_REPLAY_RRWEB_SCRIPT_MAX_ALLOWED_TEAMS = get_from_env(\"SESSION_REPLAY_RRWEB_SCRIPT_MAX_ALLOWED_TEAMS\", \"-1\")",
"comment_created_at": "2025-04-03T16:13:54+00:00",
"comment_author": "pauldambra",
"comment_body": "i went for above and below a fixed team number\r\nwe could do something like the session id hashing in posthog-js for a repeatable above or below a sample rate but this felt simpler",
"pr_file_module": null
}
]
},
{
"discussion_id": "2273215707",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/util.py",
"created_at": "2025-08-13T12:08:09+00:00",
"commented_code": "dfs(cohort_id, seen, sorted_cohort_ids)\n \n return sorted_cohort_ids\n+\n+\n+def get_dependent_cohorts_reverse(\n+ cohort: Cohort,\n+ using_database: str = \"default\",\n+) -> list[Cohort]:\n+ \"\"\"\n+ Get cohorts that depend on the given cohort (reverse dependencies).\n+ This is the opposite of get_dependent_cohorts - it finds cohorts that reference this one.\n+ \"\"\"\n+ from posthog.models.cohort.cohort import Cohort\n+ from django.db.models import Q\n+\n+ # Use database-level JSON query to filter cohorts that reference this one\n+ # This is much more efficient than loading all cohorts and checking in Python\n+ cohort_id_str = str(cohort.id)\n+\n+ # Build a query that checks if the filters JSON contains references to our cohort\n+ # We check both the new filters format and legacy groups format\n+ filter_conditions = Q()\n+\n+ # Check for cohort references in filters.properties structure\n+ filter_conditions |= Q(filters__icontains=f'\"value\": {cohort.id}')\n+ filter_conditions |= Q(filters__icontains=f'\"value\": \"{cohort_id_str}\"')\n+\n+ # Also check legacy groups format for backward compatibility\n+ filter_conditions |= Q(groups__icontains=f'\"value\": {cohort.id}')\n+ filter_conditions |= Q(groups__icontains=f'\"value\": \"{cohort_id_str}\"')\n+\n+ # Get potentially dependent cohorts using database filtering\n+ candidate_cohorts = (\n+ Cohort.objects.db_manager(using_database)\n+ .filter(filter_conditions, team=cohort.team, deleted=False)\n+ .exclude(id=cohort.id)\n+ )\n+\n+ dependent_cohorts = []\n+\n+ # Now verify the matches (since JSON icontains can have false positives)\n+ for candidate_cohort in candidate_cohorts:\n+ # Check if this cohort actually references our target cohort\n+ for prop in candidate_cohort.properties.flat:\n+ if prop.type == \"cohort\" and not isinstance(prop.value, list):\n+ try:\n+ referenced_cohort_id = int(prop.value)\n+ if referenced_cohort_id == cohort.id:\n+ dependent_cohorts.append(candidate_cohort)\n+ break # Found dependency, no need to check more properties\n+ except (ValueError, TypeError):\n+ continue\n+\n+ return dependent_cohorts\n+\n+\n+def get_minimum_required_type_for_dependency(dependency_type: str, current_type: str) -> str:\n+ \"\"\"\n+ Determine the minimum required cohort type when a dependency changes type.\n+\n+ Args:\n+ dependency_type: The new type of the dependency\n+ current_type: The current type of the dependent cohort\n+\n+ Returns:\n+ The minimum required type for the dependent cohort\n+ \"\"\"\n+ type_hierarchy = {\n+ \"static\": 0,\n+ \"person_property\": 1,\n+ \"behavioral\": 2,\n+ \"analytical\": 3,\n+ }\n+\n+ dependency_level = type_hierarchy.get(dependency_type, 3)\n+ current_level = type_hierarchy.get(current_type, 0)\n+\n+ # The dependent cohort must be at least as complex as its dependencies\n+ required_level = max(dependency_level, current_level)\n+\n+ # Convert back to type string\n+ level_to_type = {v: k for k, v in type_hierarchy.items()}\n+ return level_to_type.get(required_level, \"analytical\")",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2273215707",
"repo_full_name": "PostHog/posthog",
"pr_number": 36283,
"pr_file": "posthog/models/cohort/util.py",
"discussion_id": "2273215707",
"commented_code": "@@ -586,3 +586,140 @@ def dfs(node, seen, sorted_arr):\n dfs(cohort_id, seen, sorted_cohort_ids)\n \n return sorted_cohort_ids\n+\n+\n+def get_dependent_cohorts_reverse(\n+ cohort: Cohort,\n+ using_database: str = \"default\",\n+) -> list[Cohort]:\n+ \"\"\"\n+ Get cohorts that depend on the given cohort (reverse dependencies).\n+ This is the opposite of get_dependent_cohorts - it finds cohorts that reference this one.\n+ \"\"\"\n+ from posthog.models.cohort.cohort import Cohort\n+ from django.db.models import Q\n+\n+ # Use database-level JSON query to filter cohorts that reference this one\n+ # This is much more efficient than loading all cohorts and checking in Python\n+ cohort_id_str = str(cohort.id)\n+\n+ # Build a query that checks if the filters JSON contains references to our cohort\n+ # We check both the new filters format and legacy groups format\n+ filter_conditions = Q()\n+\n+ # Check for cohort references in filters.properties structure\n+ filter_conditions |= Q(filters__icontains=f'\"value\": {cohort.id}')\n+ filter_conditions |= Q(filters__icontains=f'\"value\": \"{cohort_id_str}\"')\n+\n+ # Also check legacy groups format for backward compatibility\n+ filter_conditions |= Q(groups__icontains=f'\"value\": {cohort.id}')\n+ filter_conditions |= Q(groups__icontains=f'\"value\": \"{cohort_id_str}\"')\n+\n+ # Get potentially dependent cohorts using database filtering\n+ candidate_cohorts = (\n+ Cohort.objects.db_manager(using_database)\n+ .filter(filter_conditions, team=cohort.team, deleted=False)\n+ .exclude(id=cohort.id)\n+ )\n+\n+ dependent_cohorts = []\n+\n+ # Now verify the matches (since JSON icontains can have false positives)\n+ for candidate_cohort in candidate_cohorts:\n+ # Check if this cohort actually references our target cohort\n+ for prop in candidate_cohort.properties.flat:\n+ if prop.type == \"cohort\" and not isinstance(prop.value, list):\n+ try:\n+ referenced_cohort_id = int(prop.value)\n+ if referenced_cohort_id == cohort.id:\n+ dependent_cohorts.append(candidate_cohort)\n+ break # Found dependency, no need to check more properties\n+ except (ValueError, TypeError):\n+ continue\n+\n+ return dependent_cohorts\n+\n+\n+def get_minimum_required_type_for_dependency(dependency_type: str, current_type: str) -> str:\n+ \"\"\"\n+ Determine the minimum required cohort type when a dependency changes type.\n+\n+ Args:\n+ dependency_type: The new type of the dependency\n+ current_type: The current type of the dependent cohort\n+\n+ Returns:\n+ The minimum required type for the dependent cohort\n+ \"\"\"\n+ type_hierarchy = {\n+ \"static\": 0,\n+ \"person_property\": 1,\n+ \"behavioral\": 2,\n+ \"analytical\": 3,\n+ }\n+\n+ dependency_level = type_hierarchy.get(dependency_type, 3)\n+ current_level = type_hierarchy.get(current_type, 0)\n+\n+ # The dependent cohort must be at least as complex as its dependencies\n+ required_level = max(dependency_level, current_level)\n+\n+ # Convert back to type string\n+ level_to_type = {v: k for k, v in type_hierarchy.items()}\n+ return level_to_type.get(required_level, \"analytical\")",
"comment_created_at": "2025-08-13T12:08:09+00:00",
"comment_author": "meikelmosby",
"comment_body": "this is pretty hard to parse.. can we type it and also maybe structure it a bit neater like\n\n```\n// possibly a type already exists somewhere .. \nCohortType = Literal[\"static\", \"person_property\", \"behavioral\", \"analytical\"]\n\ndef get_minimum_required_type_for_dependency(\n dependency_type: CohortType,\n current_type: CohortType\n) -> CohortType:\n \"\"\"\n Determine the minimum required cohort type when a dependency changes type.\n\n Ensures the dependent cohort is at least as complex as the dependency.\n\n Args:\n dependency_type: The new type of the dependency\n current_type: The current type of the dependent cohort\n\n Returns:\n The minimum required type for the dependent cohort\n \"\"\"\n type_hierarchy: Dict[CohortType, int] = {\n \"static\": 0,\n \"person_property\": 1,\n \"behavioral\": 2,\n \"analytical\": 3,\n }\n\n level_to_type: Dict[int, CohortType] = {level: type_name for type_name, level in type_hierarchy.items()}\n\n dependency_level: int = type_hierarchy[dependency_type]\n current_level: int = type_hierarchy[current_type]\n\n required_level: int = max(dependency_level, current_level)\n\n return level_to_type[required_level]\n```\n\nthat way we also dont need the fallbacks or so bc we should only be able to pass in known types, no?",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,50 @@
---
title: two-phase filtering algorithms
description: 'When working with large datasets or complex matching operations, implement
algorithms that use a two-phase approach: first filter candidates using efficient
broad criteria, then verify matches with precise logic. This pattern significantly
improves performance by reducing the number of expensive operations.'
repository: PostHog/posthog
label: Algorithms
language: Python
comments_count: 2
repository_stars: 28460
---
When working with large datasets or complex matching operations, implement algorithms that use a two-phase approach: first filter candidates using efficient broad criteria, then verify matches with precise logic. This pattern significantly improves performance by reducing the number of expensive operations.
The approach is particularly effective when:
- Database-level filtering can eliminate most irrelevant records
- Verification logic is computationally expensive
- Memory usage needs to be minimized
Example implementation:
```python
def get_dependent_cohorts_reverse(cohort: Cohort) -> list[Cohort]:
# Phase 1: Database-level filtering using broad criteria
filter_conditions = Q()
filter_conditions |= Q(filters__icontains=f'"value": {cohort.id}')
filter_conditions |= Q(filters__icontains=f'"value": "{str(cohort.id)}"')
candidate_cohorts = (
Cohort.objects.filter(filter_conditions, team=cohort.team, deleted=False)
.exclude(id=cohort.id)
)
dependent_cohorts = []
# Phase 2: Precise verification of filtered candidates
for candidate_cohort in candidate_cohorts:
for prop in candidate_cohort.properties.flat:
if prop.type == "cohort" and not isinstance(prop.value, list):
try:
if int(prop.value) == cohort.id:
dependent_cohorts.append(candidate_cohort)
break
except (ValueError, TypeError):
continue
return dependent_cohorts
```
This pattern avoids loading all records into memory and performing expensive operations on irrelevant data, instead using the database's indexing and filtering capabilities first.

View File

@@ -0,0 +1,24 @@
[
{
"discussion_id": "2245817015",
"pr_number": 35953,
"pr_file": "plugin-server/src/cdp/consumers/cdp-source-webhooks.consumer.ts",
"created_at": "2025-07-31T16:10:29+00:00",
"commented_code": "import { createInvocation, createInvocationResult } from '../utils/invocation-utils'\n import { CdpConsumerBase } from './cdp-base.consumer'\n \n+const DISALLOWED_HEADERS = ['x-forwarded-for', 'x-forwarded-host', 'x-forwarded-proto', 'x-forwarded-port', 'cookie']",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2245817015",
"repo_full_name": "PostHog/posthog",
"pr_number": 35953,
"pr_file": "plugin-server/src/cdp/consumers/cdp-source-webhooks.consumer.ts",
"discussion_id": "2245817015",
"commented_code": "@@ -16,6 +16,8 @@ import { createAddLogFunction } from '../utils'\n import { createInvocation, createInvocationResult } from '../utils/invocation-utils'\n import { CdpConsumerBase } from './cdp-base.consumer'\n \n+const DISALLOWED_HEADERS = ['x-forwarded-for', 'x-forwarded-host', 'x-forwarded-proto', 'x-forwarded-port', 'cookie']",
"comment_created_at": "2025-07-31T16:10:29+00:00",
"comment_author": "Piccirello",
"comment_body": "May want to include the below, to be safe:\r\n\r\n- `x-csrftoken`\r\n- `authorization`\r\n- `proxy-authorization`\r\n- `referer`\r\n- `forwarded`\r\n- `x-real-ip`\r\n- `true-client-ip`\r\n\r\nThis list can change as our infra changes (e.g. if we start using Cloudflare that adds some additional headers). It would be much preferable to operate off of an allowlist. I don't have a great understanding of the current use case so it's hard to say exactly, but something like this\r\n```ts\r\nconst ALLOWED_HEADERS = ['Accept', 'Accept-Encoding', 'Accept-Language', 'Cache-Control', 'Pragma', 'Content-Type', 'Content-Length', 'Content-Encoding', 'Content-Language', 'User-Agent', 'Host', 'Date']\r\n```",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,29 @@
---
title: Use allowlists over blocklists
description: When filtering data for security purposes, prefer allowlists (explicitly
defining what is permitted) over blocklists (explicitly defining what is forbidden).
Blocklists are inherently less secure because they can miss new threats, require
constant maintenance as infrastructure changes, and operate on the assumption that
anything not explicitly blocked is...
repository: PostHog/posthog
label: Security
language: TypeScript
comments_count: 1
repository_stars: 28460
---
When filtering data for security purposes, prefer allowlists (explicitly defining what is permitted) over blocklists (explicitly defining what is forbidden). Blocklists are inherently less secure because they can miss new threats, require constant maintenance as infrastructure changes, and operate on the assumption that anything not explicitly blocked is safe.
Allowlists are more secure because they operate on the principle of least privilege - only explicitly permitted items are allowed, and everything else is automatically rejected. This approach is particularly important when handling user input, HTTP headers, API parameters, or any external data.
Example of converting from blocklist to allowlist for HTTP headers:
```ts
// Insecure: blocklist approach
const DISALLOWED_HEADERS = ['x-forwarded-for', 'x-forwarded-host', 'x-forwarded-proto', 'cookie']
// Secure: allowlist approach
const ALLOWED_HEADERS = ['Accept', 'Accept-Encoding', 'Accept-Language', 'Cache-Control', 'Pragma', 'Content-Type', 'Content-Length', 'Content-Encoding', 'Content-Language', 'User-Agent', 'Host', 'Date']
```
Apply this principle when filtering file extensions, API endpoints, database fields, configuration options, or any scenario where you need to control what data is processed or passed through your system.

View File

@@ -0,0 +1,260 @@
[
{
"discussion_id": "2283612109",
"pr_number": 36693,
"pr_file": "posthog/api/feature_flag.py",
"created_at": "2025-08-18T22:12:11+00:00",
"commented_code": "except FeatureFlag.DoesNotExist:\n raise serializers.ValidationError(f\"Flag dependency references non-existent flag with ID {flag_id}\")\n \n+ def _get_properties_from_filters(self, filters: dict, property_type: str | None = None):\n+ \"\"\"\n+ Extract properties from filters by iterating through groups.\n+\n+ Args:\n+ filters: The filters dictionary containing groups\n+ property_type: Optional filter by property type (e.g., 'flag', 'cohort')",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2283612109",
"repo_full_name": "PostHog/posthog",
"pr_number": 36693,
"pr_file": "posthog/api/feature_flag.py",
"discussion_id": "2283612109",
"commented_code": "@@ -422,16 +425,34 @@ def _validate_flag_reference(self, flag_reference):\n except FeatureFlag.DoesNotExist:\n raise serializers.ValidationError(f\"Flag dependency references non-existent flag with ID {flag_id}\")\n \n+ def _get_properties_from_filters(self, filters: dict, property_type: str | None = None):\n+ \"\"\"\n+ Extract properties from filters by iterating through groups.\n+\n+ Args:\n+ filters: The filters dictionary containing groups\n+ property_type: Optional filter by property type (e.g., 'flag', 'cohort')",
"comment_created_at": "2025-08-18T22:12:11+00:00",
"comment_author": "dmarticus",
"comment_body": "[nit] this property type could probably be an StrEnum instead of a string.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2272750415",
"pr_number": 36480,
"pr_file": "posthog/hogql/database/join_functions.py",
"created_at": "2025-08-13T09:46:09+00:00",
"commented_code": "+from collections.abc import Callable\n+from typing import Any, Literal, Optional, TypeVar, overload, cast\n+from pydantic import BaseModel\n+\n+\n+class LazyJoinFunctionSerialConfig(BaseModel):\n+ type: Literal[\"join_function\"] = \"join_function\"\n+ name: str\n+\n+\n+class LazyJoinClosureSerialConfig(BaseModel):\n+ type: Literal[\"closure\"] = \"closure\"\n+ name: str\n+ args: tuple[Any, ...]\n+\n+\n+REGISTERED_JOIN_FUNCTIONS: dict[str, Callable] = {}\n+\n+\n+REGISTERED_JOIN_CLOSURES: dict[str, Callable] = {}\n+\n+_F = TypeVar(\"_F\", bound=Callable)\n+\n+\n+@overload\n+def register_join_function(_func: _F) -> _F: ...\n+\n+\n+@overload\n+def register_join_function(*, name: Optional[str] = ..., closure: bool = ...) -> Callable[[_F], _F]: ...\n+\n+\n+def register_join_function(_func: Optional[_F] = None, *, name: Optional[str] = None, closure: bool = False):\n+ \"\"\"\n+ Decorator to register a join function in the allowlist.\n+\n+ Usage:\n+ - @register_join_function\n+ - @register_join_function()\n+ - @register_join_function(name=\"custom_name\")\n+ - @register_join_function(closure=True) # for factory functions returning a join callable",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2272750415",
"repo_full_name": "PostHog/posthog",
"pr_number": 36480,
"pr_file": "posthog/hogql/database/join_functions.py",
"discussion_id": "2272750415",
"commented_code": "@@ -0,0 +1,63 @@\n+from collections.abc import Callable\n+from typing import Any, Literal, Optional, TypeVar, overload, cast\n+from pydantic import BaseModel\n+\n+\n+class LazyJoinFunctionSerialConfig(BaseModel):\n+ type: Literal[\"join_function\"] = \"join_function\"\n+ name: str\n+\n+\n+class LazyJoinClosureSerialConfig(BaseModel):\n+ type: Literal[\"closure\"] = \"closure\"\n+ name: str\n+ args: tuple[Any, ...]\n+\n+\n+REGISTERED_JOIN_FUNCTIONS: dict[str, Callable] = {}\n+\n+\n+REGISTERED_JOIN_CLOSURES: dict[str, Callable] = {}\n+\n+_F = TypeVar(\"_F\", bound=Callable)\n+\n+\n+@overload\n+def register_join_function(_func: _F) -> _F: ...\n+\n+\n+@overload\n+def register_join_function(*, name: Optional[str] = ..., closure: bool = ...) -> Callable[[_F], _F]: ...\n+\n+\n+def register_join_function(_func: Optional[_F] = None, *, name: Optional[str] = None, closure: bool = False):\n+ \"\"\"\n+ Decorator to register a join function in the allowlist.\n+\n+ Usage:\n+ - @register_join_function\n+ - @register_join_function()\n+ - @register_join_function(name=\"custom_name\")\n+ - @register_join_function(closure=True) # for factory functions returning a join callable",
"comment_created_at": "2025-08-13T09:46:09+00:00",
"comment_author": "Gilbert09",
"comment_body": "I wasn't sure what the closure param meant until I read this comment - renaming it to `is_factory=True` is probably more familiar language here?",
"pr_file_module": null
}
]
},
{
"discussion_id": "2278675298",
"pr_number": 36608,
"pr_file": "posthog/models/cache.py",
"created_at": "2025-08-15T09:24:17+00:00",
"commented_code": "+from typing import TYPE_CHECKING, Optional\n+from django.core import serializers\n+from django.db.models import QuerySet, Manager\n+import posthoganalytics\n+from prometheus_client import Counter\n+\n+from posthog.exceptions_capture import capture_exception\n+from posthog.git import get_git_commit_short\n+from posthog.redis import get_client\n+from posthog.settings import TEST\n+\n+if TYPE_CHECKING:\n+ from posthog.models import Team\n+\n+\n+DATABASE_CACHE_COUNTER = Counter(\n+ \"posthog_get_model_cache\",",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2278675298",
"repo_full_name": "PostHog/posthog",
"pr_number": 36608,
"pr_file": "posthog/models/cache.py",
"discussion_id": "2278675298",
"commented_code": "@@ -0,0 +1,104 @@\n+from typing import TYPE_CHECKING, Optional\n+from django.core import serializers\n+from django.db.models import QuerySet, Manager\n+import posthoganalytics\n+from prometheus_client import Counter\n+\n+from posthog.exceptions_capture import capture_exception\n+from posthog.git import get_git_commit_short\n+from posthog.redis import get_client\n+from posthog.settings import TEST\n+\n+if TYPE_CHECKING:\n+ from posthog.models import Team\n+\n+\n+DATABASE_CACHE_COUNTER = Counter(\n+ \"posthog_get_model_cache\",",
"comment_created_at": "2025-08-15T09:24:17+00:00",
"comment_author": "Gilbert09",
"comment_body": "`cache_hit` and `cache_miss` are the more typical terms for whether an item was found in the cache or not ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2270732761",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"created_at": "2025-08-12T18:11:32+00:00",
"commented_code": "survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2270732761",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"discussion_id": "2270732761",
"commented_code": "@@ -167,3 +151,133 @@ def _prepare_survey_data(self, survey_schema: SurveyCreationSchema, team: Team)\n survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):",
"comment_created_at": "2025-08-12T18:11:32+00:00",
"comment_author": "lucasheriques",
"comment_body": "I think the naming here is confusing. This doesn't seem to be a FeatureFlagToolkit, but instead `SurveyToolkit`. Reasoning:\r\n\r\n- Looks like this is responsible for executing sending the system prompt about survey creation, plus having other tools to help find more information\r\n- One of these tools is looking up a freature flag id\r\n- But in the future, we might add more tools to look up more things I think?\r\n\r\nfar from an expert here, so let me know if this makes sense. cc @denakorita ",
"pr_file_module": null
},
{
"comment_id": "2272473578",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"discussion_id": "2270732761",
"commented_code": "@@ -167,3 +151,133 @@ def _prepare_survey_data(self, survey_schema: SurveyCreationSchema, team: Team)\n survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):",
"comment_created_at": "2025-08-13T08:18:40+00:00",
"comment_author": "denakorita",
"comment_body": "Yeap, agreed ",
"pr_file_module": null
},
{
"comment_id": "2272538049",
"repo_full_name": "PostHog/posthog",
"pr_number": 36374,
"pr_file": "products/surveys/backend/max_tools.py",
"discussion_id": "2270732761",
"commented_code": "@@ -167,3 +151,133 @@ def _prepare_survey_data(self, survey_schema: SurveyCreationSchema, team: Team)\n survey_data[\"appearance\"] = appearance\n \n return survey_data\n+\n+\n+class FeatureFlagToolkit(TaxonomyAgentToolkit):",
"comment_created_at": "2025-08-13T08:42:32+00:00",
"comment_author": "marandaneto",
"comment_body": "will rename",
"pr_file_module": null
}
]
},
{
"discussion_id": "2247447902",
"pr_number": 35401,
"pr_file": "products/issue_tracker/backend/models.py",
"created_at": "2025-08-01T09:29:22+00:00",
"commented_code": "+from django.db import models\n+from django.utils import timezone\n+import uuid\n+\n+\n+class Issue(models.Model):\n+ class Status(models.TextChoices):\n+ BACKLOG = \"backlog\", \"Backlog\"\n+ TODO = \"todo\", \"To Do\"\n+ IN_PROGRESS = \"in_progress\", \"In Progress\"\n+ TESTING = \"testing\", \"Testing\"\n+ DONE = \"done\", \"Done\"\n+\n+ class OriginProduct(models.TextChoices):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2247447902",
"repo_full_name": "PostHog/posthog",
"pr_number": 35401,
"pr_file": "products/issue_tracker/backend/models.py",
"discussion_id": "2247447902",
"commented_code": "@@ -0,0 +1,184 @@\n+from django.db import models\n+from django.utils import timezone\n+import uuid\n+\n+\n+class Issue(models.Model):\n+ class Status(models.TextChoices):\n+ BACKLOG = \"backlog\", \"Backlog\"\n+ TODO = \"todo\", \"To Do\"\n+ IN_PROGRESS = \"in_progress\", \"In Progress\"\n+ TESTING = \"testing\", \"Testing\"\n+ DONE = \"done\", \"Done\"\n+\n+ class OriginProduct(models.TextChoices):",
"comment_created_at": "2025-08-01T09:29:22+00:00",
"comment_author": "daibhin",
"comment_body": "Not all of these are products\r\n```suggestion\r\n class Origin(models.TextChoices):\r\n```",
"pr_file_module": null
}
]
},
{
"discussion_id": "2263656715",
"pr_number": 36354,
"pr_file": "posthog/models/feature_flag/local_evaluation.py",
"created_at": "2025-08-08T17:38:25+00:00",
"commented_code": "+from django.conf import settings",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2263656715",
"repo_full_name": "PostHog/posthog",
"pr_number": 36354,
"pr_file": "posthog/models/feature_flag/local_evaluation.py",
"discussion_id": "2263656715",
"commented_code": "@@ -0,0 +1,174 @@\n+from django.conf import settings",
"comment_created_at": "2025-08-08T17:38:25+00:00",
"comment_author": "haacked",
"comment_body": "This doesn't have to happen in this PR. I'm more trying to get more people's opinion on this:\r\n\r\nI've been wanting to change the name to `flag_definitions` because \"local evaluation\" is the process that happens in the SDKs, but \"flag definitions\" is what the SDKs are requesting and what this code generates.\r\n\r\nThoughts?",
"pr_file_module": null
}
]
},
{
"discussion_id": "2263660225",
"pr_number": 36354,
"pr_file": "posthog/models/feature_flag/local_evaluation.py",
"created_at": "2025-08-08T17:40:38+00:00",
"commented_code": "+from django.conf import settings\n+from django.db.models.signals import post_save\n+from django.dispatch import receiver\n+import structlog\n+\n+from django.db.models import Q\n+\n+from posthog.models.cohort.cohort import Cohort, CohortOrEmpty\n+from posthog.models.feature_flag import FeatureFlag\n+from posthog.models.group_type_mapping import GroupTypeMapping\n+from posthog.models.team import Team\n+from posthog.storage.hypercache import HyperCache\n+\n+logger = structlog.get_logger(__name__)\n+\n+DATABASE_FOR_LOCAL_EVALUATION = (\n+ \"default\"\n+ if (\"local_evaluation\" not in settings.READ_REPLICA_OPT_IN or \"replica\" not in settings.DATABASES) # noqa: F821\n+ else \"replica\"\n+)\n+\n+flags_hypercache = HyperCache(",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2263660225",
"repo_full_name": "PostHog/posthog",
"pr_number": 36354,
"pr_file": "posthog/models/feature_flag/local_evaluation.py",
"discussion_id": "2263660225",
"commented_code": "@@ -0,0 +1,174 @@\n+from django.conf import settings\n+from django.db.models.signals import post_save\n+from django.dispatch import receiver\n+import structlog\n+\n+from django.db.models import Q\n+\n+from posthog.models.cohort.cohort import Cohort, CohortOrEmpty\n+from posthog.models.feature_flag import FeatureFlag\n+from posthog.models.group_type_mapping import GroupTypeMapping\n+from posthog.models.team import Team\n+from posthog.storage.hypercache import HyperCache\n+\n+logger = structlog.get_logger(__name__)\n+\n+DATABASE_FOR_LOCAL_EVALUATION = (\n+ \"default\"\n+ if (\"local_evaluation\" not in settings.READ_REPLICA_OPT_IN or \"replica\" not in settings.DATABASES) # noqa: F821\n+ else \"replica\"\n+)\n+\n+flags_hypercache = HyperCache(",
"comment_created_at": "2025-08-08T17:40:38+00:00",
"comment_author": "haacked",
"comment_body": "It'd be a little clearer to name this `flags_with_cohorts_hypercache`.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2262245330",
"pr_number": 36339,
"pr_file": "posthog/models/personal_api_key.py",
"created_at": "2025-08-08T08:09:49+00:00",
"commented_code": "null=True,\n blank=True,\n )\n+\n+\n+def find(token: str) -> tuple[PersonalAPIKey, str] | None:",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2262245330",
"repo_full_name": "PostHog/posthog",
"pr_number": 36339,
"pr_file": "posthog/models/personal_api_key.py",
"discussion_id": "2262245330",
"commented_code": "@@ -62,3 +62,20 @@ class PersonalAPIKey(models.Model):\n null=True,\n blank=True,\n )\n+\n+\n+def find(token: str) -> tuple[PersonalAPIKey, str] | None:",
"comment_created_at": "2025-08-08T08:09:49+00:00",
"comment_author": "joshsny",
"comment_body": "nit: would be nice if this had a more descriptive name",
"pr_file_module": null
},
{
"comment_id": "2262525837",
"repo_full_name": "PostHog/posthog",
"pr_number": 36339,
"pr_file": "posthog/models/personal_api_key.py",
"discussion_id": "2262245330",
"commented_code": "@@ -62,3 +62,20 @@ class PersonalAPIKey(models.Model):\n null=True,\n blank=True,\n )\n+\n+\n+def find(token: str) -> tuple[PersonalAPIKey, str] | None:",
"comment_created_at": "2025-08-08T10:03:07+00:00",
"comment_author": "zlwaterfield",
"comment_body": "+1",
"pr_file_module": null
},
{
"comment_id": "2263655699",
"repo_full_name": "PostHog/posthog",
"pr_number": 36339,
"pr_file": "posthog/models/personal_api_key.py",
"discussion_id": "2262245330",
"commented_code": "@@ -62,3 +62,20 @@ class PersonalAPIKey(models.Model):\n null=True,\n blank=True,\n )\n+\n+\n+def find(token: str) -> tuple[PersonalAPIKey, str] | None:",
"comment_created_at": "2025-08-08T17:37:43+00:00",
"comment_author": "Piccirello",
"comment_body": "I'm surprised given that this is already scoped to the `PersonalAPIKey` model. But will rename to `find_personal_api_key` - unless you had something else in mind?",
"pr_file_module": null
}
]
},
{
"discussion_id": "2247282068",
"pr_number": 35980,
"pr_file": "posthog/models/notebook/notebook_relationship.py",
"created_at": "2025-08-01T08:11:32+00:00",
"commented_code": "+from django.core.exceptions import ValidationError\n+from django.db import models\n+from posthog.models.utils import UUIDModel, build_unique_relationship_check, build_partial_uniqueness_constraint\n+\n+RELATED_OBJECTS = (\"group\",)\n+\n+\n+class NotebookRelationship(UUIDModel):",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2247282068",
"repo_full_name": "PostHog/posthog",
"pr_number": 35980,
"pr_file": "posthog/models/notebook/notebook_relationship.py",
"discussion_id": "2247282068",
"commented_code": "@@ -0,0 +1,73 @@\n+from django.core.exceptions import ValidationError\n+from django.db import models\n+from posthog.models.utils import UUIDModel, build_unique_relationship_check, build_partial_uniqueness_constraint\n+\n+RELATED_OBJECTS = (\"group\",)\n+\n+\n+class NotebookRelationship(UUIDModel):",
"comment_created_at": "2025-08-01T08:11:32+00:00",
"comment_author": "daibhin",
"comment_body": "Is `ResourceNotebook` a better name here? I think it's pretty clear this is a relationship so don't necessarily love it in the model name. We often call objects in posthog (groups, replays, issues) \"resources\"",
"pr_file_module": null
},
{
"comment_id": "2247817499",
"repo_full_name": "PostHog/posthog",
"pr_number": 35980,
"pr_file": "posthog/models/notebook/notebook_relationship.py",
"discussion_id": "2247282068",
"commented_code": "@@ -0,0 +1,73 @@\n+from django.core.exceptions import ValidationError\n+from django.db import models\n+from posthog.models.utils import UUIDModel, build_unique_relationship_check, build_partial_uniqueness_constraint\n+\n+RELATED_OBJECTS = (\"group\",)\n+\n+\n+class NotebookRelationship(UUIDModel):",
"comment_created_at": "2025-08-01T12:08:56+00:00",
"comment_author": "arthurdedeus",
"comment_body": "> We often call objects in posthog (groups, replays, issues) \"resources\"\r\n\r\nToday I learned! I think it makes sense to rename",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,47 @@
---
title: Use descriptive names
description: Choose names that clearly communicate purpose and accurately represent
what they describe. Avoid ambiguous or misleading names that require additional
context to understand.
repository: PostHog/posthog
label: Naming Conventions
language: Python
comments_count: 9
repository_stars: 28460
---
Choose names that clearly communicate purpose and accurately represent what they describe. Avoid ambiguous or misleading names that require additional context to understand.
Key principles:
- **Be specific**: Use precise terms that indicate the actual purpose or content
- **Avoid generic terms**: Replace vague names with descriptive alternatives
- **Match semantic meaning**: Ensure names accurately reflect what they represent
Examples of improvements:
```python
# Too generic/ambiguous
property_type: str # What kind of property type?
closure: bool # What does this boolean represent?
key = name or "default" # Key for what? Too generic
# More descriptive
property_type: PropertyTypeEnum # Use enum for type safety
is_factory: bool # Clearly indicates factory pattern
registration_key = f"{module_name}#{function_name}" # Specific and unique
# Misleading names
class FeatureFlagToolkit: # Actually handles surveys
pass
class OriginProduct: # Not all values are products
pass
# Accurate names
class SurveyToolkit: # Accurately describes purpose
pass
class Origin: # Broader, more accurate term
pass
```
When naming is unclear, consider: "Would a new developer understand this name without additional context?" If not, choose a more descriptive alternative.

View File

@@ -0,0 +1,46 @@
[
{
"discussion_id": "2266647379",
"pr_number": 36401,
"pr_file": "rust/batch-import-worker/src/error/mod.rs",
"created_at": "2025-08-11T12:53:34+00:00",
"commented_code": "DEFAULT_USER_ERROR_MESSAGE\n }\n \n+#[derive(Error, Debug)]\n+#[error(\"Rate limited\")]\n+pub struct RateLimitedError {\n+ pub retry_after: Option<Duration>,\n+ #[source]\n+ pub source: reqwest::Error,\n+}\n+\n+/// Extracts a Retry-After duration if a RateLimitedError is present in the error chain\n+pub fn extract_retry_after_from_error(error: &anyhow::Error) -> Option<Duration> {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2266647379",
"repo_full_name": "PostHog/posthog",
"pr_number": 36401,
"pr_file": "rust/batch-import-worker/src/error/mod.rs",
"discussion_id": "2266647379",
"commented_code": "@@ -42,10 +43,61 @@ pub fn get_user_message(error: &anyhow::Error) -> &str {\n DEFAULT_USER_ERROR_MESSAGE\n }\n \n+#[derive(Error, Debug)]\n+#[error(\"Rate limited\")]\n+pub struct RateLimitedError {\n+ pub retry_after: Option<Duration>,\n+ #[source]\n+ pub source: reqwest::Error,\n+}\n+\n+/// Extracts a Retry-After duration if a RateLimitedError is present in the error chain\n+pub fn extract_retry_after_from_error(error: &anyhow::Error) -> Option<Duration> {",
"comment_created_at": "2025-08-11T12:53:34+00:00",
"comment_author": "jose-sequeira",
"comment_body": "Think using `chain()` iterator is more straightforward than managing source()\n\n```suggestion\npub fn extract_retry_after_from_error(error: &anyhow::Error) -> Option<Duration> {\n error\n .chain()\n .find_map(|e| e.downcast_ref::<RateLimitedError>())\n .and_then(|rl| rl.retry_after)\n}\n```",
"pr_file_module": null
}
]
},
{
"discussion_id": "2266650106",
"pr_number": 36401,
"pr_file": "rust/batch-import-worker/src/error/mod.rs",
"created_at": "2025-08-11T12:54:37+00:00",
"commented_code": "DEFAULT_USER_ERROR_MESSAGE\n }\n \n+#[derive(Error, Debug)]\n+#[error(\"Rate limited\")]\n+pub struct RateLimitedError {\n+ pub retry_after: Option<Duration>,\n+ #[source]\n+ pub source: reqwest::Error,\n+}\n+\n+/// Extracts a Retry-After duration if a RateLimitedError is present in the error chain\n+pub fn extract_retry_after_from_error(error: &anyhow::Error) -> Option<Duration> {\n+ if let Some(rl) = error.downcast_ref::<RateLimitedError>() {\n+ return rl.retry_after;\n+ }\n+\n+ let mut source = error.source();\n+ while let Some(err) = source {\n+ if let Some(rl) = err.downcast_ref::<RateLimitedError>() {\n+ return rl.retry_after;\n+ }\n+ source = err.source();\n+ }\n+ None\n+}\n+\n+/// Returns true if the error chain contains a reqwest::Error with HTTP 429.\n+pub fn is_rate_limited_error(error: &anyhow::Error) -> bool {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2266650106",
"repo_full_name": "PostHog/posthog",
"pr_number": 36401,
"pr_file": "rust/batch-import-worker/src/error/mod.rs",
"discussion_id": "2266650106",
"commented_code": "@@ -42,10 +43,61 @@ pub fn get_user_message(error: &anyhow::Error) -> &str {\n DEFAULT_USER_ERROR_MESSAGE\n }\n \n+#[derive(Error, Debug)]\n+#[error(\"Rate limited\")]\n+pub struct RateLimitedError {\n+ pub retry_after: Option<Duration>,\n+ #[source]\n+ pub source: reqwest::Error,\n+}\n+\n+/// Extracts a Retry-After duration if a RateLimitedError is present in the error chain\n+pub fn extract_retry_after_from_error(error: &anyhow::Error) -> Option<Duration> {\n+ if let Some(rl) = error.downcast_ref::<RateLimitedError>() {\n+ return rl.retry_after;\n+ }\n+\n+ let mut source = error.source();\n+ while let Some(err) = source {\n+ if let Some(rl) = err.downcast_ref::<RateLimitedError>() {\n+ return rl.retry_after;\n+ }\n+ source = err.source();\n+ }\n+ None\n+}\n+\n+/// Returns true if the error chain contains a reqwest::Error with HTTP 429.\n+pub fn is_rate_limited_error(error: &anyhow::Error) -> bool {",
"comment_created_at": "2025-08-11T12:54:37+00:00",
"comment_author": "jose-sequeira",
"comment_body": "Same for here, this can be simplified\n\n```\n error.chain().any(|e| {\n e.is::<RateLimitedError>()\n || (e\n .downcast_ref::<reqwest::Error>()\n .and_then(|re| re.status())\n == Some(StatusCode::TOO_MANY_REQUESTS))\n })\n```",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,35 @@
---
title: Use error chain iterators
description: When traversing error chains in Rust, prefer using the `chain()` iterator
method over manual source traversal with while loops. The `chain()` method provides
a more idiomatic and readable approach to walking through error chains, eliminating
the need for manual loop management and making the code more concise.
repository: PostHog/posthog
label: Error Handling
language: Rust
comments_count: 2
repository_stars: 28460
---
When traversing error chains in Rust, prefer using the `chain()` iterator method over manual source traversal with while loops. The `chain()` method provides a more idiomatic and readable approach to walking through error chains, eliminating the need for manual loop management and making the code more concise.
Instead of manually iterating through error sources:
```rust
let mut source = error.source();
while let Some(err) = source {
if let Some(rl) = err.downcast_ref::<RateLimitedError>() {
return rl.retry_after;
}
source = err.source();
}
```
Use the iterator-based approach:
```rust
error
.chain()
.find_map(|e| e.downcast_ref::<RateLimitedError>())
.and_then(|rl| rl.retry_after)
```
This pattern works well with other iterator methods like `any()`, `find()`, and `filter_map()` to create more expressive and maintainable error handling code.

View File

@@ -0,0 +1,24 @@
[
{
"discussion_id": "2241445914",
"pr_number": 35239,
"pr_file": "frontend/src/scenes/surveys/SurveyResponseLimitWidget.tsx",
"created_at": "2025-07-30T03:25:00+00:00",
"commented_code": "+import { IconInfo } from '@posthog/icons'\n+import { LemonBanner, Link, Tooltip } from '@posthog/lemon-ui'\n+import { useValues } from 'kea'\n+import { billingLogic } from 'scenes/billing/billingLogic'\n+import { userLogic } from 'scenes/userLogic'\n+\n+const DEFAULT_SURVEY_RESPONSE_LIMIT = 250\n+\n+export function SurveyResponseLimitWidget(): JSX.Element | null {\n+ const { billing } = useValues(billingLogic)\n+ const { user } = useValues(userLogic)\n+\n+ // Only show for non-admin users\n+ if (user?.is_staff || user?.is_impersonated) {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2241445914",
"repo_full_name": "PostHog/posthog",
"pr_number": 35239,
"pr_file": "frontend/src/scenes/surveys/SurveyResponseLimitWidget.tsx",
"discussion_id": "2241445914",
"commented_code": "@@ -0,0 +1,64 @@\n+import { IconInfo } from '@posthog/icons'\n+import { LemonBanner, Link, Tooltip } from '@posthog/lemon-ui'\n+import { useValues } from 'kea'\n+import { billingLogic } from 'scenes/billing/billingLogic'\n+import { userLogic } from 'scenes/userLogic'\n+\n+const DEFAULT_SURVEY_RESPONSE_LIMIT = 250\n+\n+export function SurveyResponseLimitWidget(): JSX.Element | null {\n+ const { billing } = useValues(billingLogic)\n+ const { user } = useValues(userLogic)\n+\n+ // Only show for non-admin users\n+ if (user?.is_staff || user?.is_impersonated) {",
"comment_created_at": "2025-07-30T03:25:00+00:00",
"comment_author": "lucasheriques",
"comment_body": "`is_staff` and `is_impersonated` are not for user roles, but instead, Django attributes to determine wether the current user has access to the admin panel or if they are impersonating someone\r\n\r\nfor checking if they are an admin, you can instead import the organization logic like this:\r\n\r\n`const { isAdminOrOwner } = useValues(organizationLogic)`\r\n\r\nthere are some examples in the codebase",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,27 @@
---
title: Use proper authorization attributes
description: Avoid using Django framework attributes like `is_staff` and `is_impersonated`
for application role checking, as these serve different purposes (admin panel access
and user impersonation respectively). Instead, use application-specific role validation
methods to ensure proper authorization.
repository: PostHog/posthog
label: Security
language: TSX
comments_count: 1
repository_stars: 28460
---
Avoid using Django framework attributes like `is_staff` and `is_impersonated` for application role checking, as these serve different purposes (admin panel access and user impersonation respectively). Instead, use application-specific role validation methods to ensure proper authorization.
For role-based access control, import and use the appropriate organization logic:
```typescript
const { isAdminOrOwner } = useValues(organizationLogic)
// Use this instead of user?.is_staff
if (isAdminOrOwner) {
// Admin/owner specific logic
}
```
This prevents potential authorization bypass vulnerabilities that could occur when framework attributes are misused for application-level access control decisions.

View File

@@ -0,0 +1,46 @@
[
{
"discussion_id": "2259951821",
"pr_number": 36297,
"pr_file": "plugin-server/src/cdp/utils.ts",
"created_at": "2025-08-07T13:11:01+00:00",
"commented_code": "}\n }\n \n+ let properties = data.event.properties\n+\n+ if ('exception_props' in properties) {",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2260283082",
"repo_full_name": "PostHog/posthog",
"pr_number": 36297,
"pr_file": "plugin-server/src/cdp/utils.ts",
"discussion_id": "2259951821",
"commented_code": "@@ -112,6 +112,13 @@ export function convertInternalEventToHogFunctionInvocationGlobals(\n }\n }\n \n+ let properties = data.event.properties\n+\n+ if ('exception_props' in properties) {",
"comment_created_at": "2025-08-07T13:11:01+00:00",
"comment_author": "benjackwhite",
"comment_body": "Spread wont fail if it is undefined or null so thats rubbish, but probably good to at least check that its truthy.",
"pr_file_module": null
}
]
},
{
"discussion_id": "2082100538",
"pr_number": 32090,
"pr_file": "plugin-server/src/worker/ingestion/event-pipeline/prepareEventStep.ts",
"created_at": "2025-05-09T16:49:36+00:00",
"commented_code": "uuid!, // it will throw if it's undefined,\n processPerson\n )\n+ if (event.now) {\n+ const capturedAtDateTime = DateTime.fromISO(event.now).toUTC()\n+ preIngestionEvent.capturedAt = capturedAtDateTime.isValid",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2082100538",
"repo_full_name": "PostHog/posthog",
"pr_number": 32090,
"pr_file": "plugin-server/src/worker/ingestion/event-pipeline/prepareEventStep.ts",
"discussion_id": "2082100538",
"commented_code": "@@ -42,6 +43,12 @@ export async function prepareEventStep(\n uuid!, // it will throw if it's undefined,\n processPerson\n )\n+ if (event.now) {\n+ const capturedAtDateTime = DateTime.fromISO(event.now).toUTC()\n+ preIngestionEvent.capturedAt = capturedAtDateTime.isValid",
"comment_created_at": "2025-05-09T16:49:36+00:00",
"comment_author": "nickbest-ph",
"comment_body": "NOTE: event.now should always be set and be valid, but if it is not and this gets set to `undefined` based on my testing in local development we'll get `1970-01-01 00:00:00.000000` as the value in clickhouse.",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,38 @@
---
title: validate before use
description: Always validate that values are truthy or defined before using them,
even when they are expected to exist. This prevents runtime errors and unexpected
behavior when assumptions about data presence are violated.
repository: PostHog/posthog
label: Null Handling
language: TypeScript
comments_count: 2
repository_stars: 28460
---
Always validate that values are truthy or defined before using them, even when they are expected to exist. This prevents runtime errors and unexpected behavior when assumptions about data presence are violated.
The pattern helps catch edge cases where expected values might be missing, null, or undefined, avoiding silent failures or unexpected default behaviors.
Example:
```typescript
// Instead of assuming properties exists
let properties = data.event.properties
if ('exception_props' in properties) {
// use properties
}
// Validate it's truthy first
let properties = data.event.properties
if (properties && 'exception_props' in properties) {
// use properties
}
// For expected values that might be missing
if (event.now) {
const capturedAtDateTime = DateTime.fromISO(event.now).toUTC()
preIngestionEvent.capturedAt = capturedAtDateTime.isValid ? capturedAtDateTime : undefined
}
```
This approach prevents scenarios like getting unexpected fallback values (e.g., "1970-01-01 00:00:00.000000" in databases) when validation fails silently.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,34 @@
---
title: Validate inputs recursively
description: Always implement recursive validation and sanitization for user inputs,
especially when dealing with encoded content or external data sources. Single-pass
validation can be bypassed through multiple encoding layers or nested attacks.
repository: PostHog/posthog
label: Security
language: Python
comments_count: 3
repository_stars: 28460
---
Always implement recursive validation and sanitization for user inputs, especially when dealing with encoded content or external data sources. Single-pass validation can be bypassed through multiple encoding layers or nested attacks.
When validating URLs, decode recursively until no further changes occur to prevent encoding bypass attacks like `javascript%253Aalert(1)` which could decode through multiple layers to become `javascript:alert(1)`. Similarly, when integrating with external services, never trust their validation - always re-validate on your end.
Example of secure URL validation:
```python
def _is_safe_url(self, url: str) -> bool:
"""Validate URL with recursive decoding to prevent bypass attacks."""
# Recursively decode until no changes to prevent encoding bypasses
decoded = url
while True:
new_decoded = unquote(decoded)
if new_decoded == decoded:
break
decoded = new_decoded
# Now validate the fully decoded URL
parsed = urlparse(decoded.lower())
return parsed.scheme in self.ALLOWED_SCHEMES
```
This approach prevents attackers from using multiple encoding layers to bypass validation and ensures that external data sources are not blindly trusted for security-critical decisions like email verification status.

View File

@@ -0,0 +1,94 @@
[
{
"discussion_id": "2231080506",
"pr_number": 35636,
"pr_file": "plugin-server/cassandra/migrations/003_person_event_occurrences.cql",
"created_at": "2025-07-25T13:25:28+00:00",
"commented_code": "+-- Migration: Create person event occurrences table\n+-- Created: 2025-07-25\n+-- Description: Table to track whether a person has performed a specific event\n+-- Optimized for occurrence tracking (set semantics) with efficient queries by person_id and event_name\n+\n+-- Create the person event occurrences table\n+-- This table stores occurrences with the pattern:\n+-- team_id:person_id:event_name\n+-- If a record exists, the event occurred\n+CREATE TABLE IF NOT EXISTS person_event_occurrences (\n+ team_id INT,",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2231080506",
"repo_full_name": "PostHog/posthog",
"pr_number": 35636,
"pr_file": "plugin-server/cassandra/migrations/003_person_event_occurrences.cql",
"discussion_id": "2231080506",
"commented_code": "@@ -0,0 +1,24 @@\n+-- Migration: Create person event occurrences table\n+-- Created: 2025-07-25\n+-- Description: Table to track whether a person has performed a specific event\n+-- Optimized for occurrence tracking (set semantics) with efficient queries by person_id and event_name\n+\n+-- Create the person event occurrences table\n+-- This table stores occurrences with the pattern:\n+-- team_id:person_id:event_name\n+-- If a record exists, the event occurred\n+CREATE TABLE IF NOT EXISTS person_event_occurrences (\n+ team_id INT,",
"comment_created_at": "2025-07-25T13:25:28+00:00",
"comment_author": "meikelmosby",
"comment_body": "question here is if personId and eventName is enough or do we _want_ to also have the teamId? ",
"pr_file_module": null
}
]
},
{
"discussion_id": "2254181438",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"created_at": "2025-08-05T12:21:45+00:00",
"commented_code": "\"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2254181438",
"repo_full_name": "PostHog/posthog",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"discussion_id": "2254181438",
"commented_code": "@@ -11030,7 +11030,7 @@ exports[`segment templates template segment-actions-hyperengage matches expected\n \"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"comment_created_at": "2025-08-05T12:21:45+00:00",
"comment_author": "benjackwhite",
"comment_body": "Are you sure thats the right property? Groups aren't specified that way...",
"pr_file_module": null
},
{
"comment_id": "2254256348",
"repo_full_name": "PostHog/posthog",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"discussion_id": "2254181438",
"commented_code": "@@ -11030,7 +11030,7 @@ exports[`segment templates template segment-actions-hyperengage matches expected\n \"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"comment_created_at": "2025-08-05T12:53:00+00:00",
"comment_author": "MarconLP",
"comment_body": "I assume you are referring to `event.context.group_id`. Adding a step to filter that property out entirely.",
"pr_file_module": null
},
{
"comment_id": "2254278237",
"repo_full_name": "PostHog/posthog",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"discussion_id": "2254181438",
"commented_code": "@@ -11030,7 +11030,7 @@ exports[`segment templates template segment-actions-hyperengage matches expected\n \"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"comment_created_at": "2025-08-05T13:01:45+00:00",
"comment_author": "benjackwhite",
"comment_body": "No I mean that groups are done like `$groups: { key: value }` in events i thought?",
"pr_file_module": null
},
{
"comment_id": "2254299070",
"repo_full_name": "PostHog/posthog",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"discussion_id": "2254181438",
"commented_code": "@@ -11030,7 +11030,7 @@ exports[`segment templates template segment-actions-hyperengage matches expected\n \"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"comment_created_at": "2025-08-05T13:09:30+00:00",
"comment_author": "MarconLP",
"comment_body": "We do both, but you don't need to know the group key to use `$group_0`\r\n<img width=\"1125\" height=\"375\" alt=\"2025-08-05 at 15 07 28\" src=\"https://github.com/user-attachments/assets/72f3e304-9d64-431d-9fd3-65516395c368\" />\r\n\r\nhttps://us.posthog.com/project/35787/events/01986539-81fc-72be-b42d-e3763c76a220/2025-08-01T12%3A42%3A04.795000%2B02%3A00\r\n",
"pr_file_module": null
},
{
"comment_id": "2254382617",
"repo_full_name": "PostHog/posthog",
"pr_number": 36191,
"pr_file": "plugin-server/src/cdp/segment/__snapshots__/segment-templates.test.ts.snap",
"discussion_id": "2254181438",
"commented_code": "@@ -11030,7 +11030,7 @@ exports[`segment templates template segment-actions-hyperengage matches expected\n \"type\": \"string\",\n },\n {\n- \"default\": \"{event.context.group_id ?? event.groupId}\",\n+ \"default\": \"{event.context.group_id ?? event.properties.$group_0}\",",
"comment_created_at": "2025-08-05T13:39:39+00:00",
"comment_author": "benjackwhite",
"comment_body": "But now the user needs to know the group number right? It just feels like a weird. I think I get why its this way because we dont know the group key upfront but its more confusing for user inputting values \ud83e\udd14 ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,32 @@
---
title: validate schema decisions
description: When reviewing database schema changes or data structure modifications,
ensure that field inclusion/exclusion decisions are explicitly justified and documented.
Question ambiguous schema choices and require clear rationale for data organization
patterns.
repository: PostHog/posthog
label: Database
language: Other
comments_count: 2
repository_stars: 28460
---
When reviewing database schema changes or data structure modifications, ensure that field inclusion/exclusion decisions are explicitly justified and documented. Question ambiguous schema choices and require clear rationale for data organization patterns.
Schema decisions should address:
- Why specific fields are included or omitted (e.g., "Do we need team_id in addition to person_id and event_name?")
- How the chosen structure supports expected query patterns
- Whether data access methods are intuitive for developers
Example from schema design:
```sql
-- Migration: Create person event occurrences table
-- Question: Is team_id necessary alongside person_id and event_name?
CREATE TABLE IF NOT EXISTS person_event_occurrences (
team_id INT, -- Rationale needed: Does this enable required queries?
person_id TEXT,
event_name TEXT
);
```
When data property references seem unclear or confusing (like choosing between `$groups: { key: value }` vs `$group_0`), document the reasoning behind the chosen approach and consider developer experience in accessing the data.

View File

@@ -0,0 +1,70 @@
[
{
"discussion_id": "2275746221",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/messaging/recipient-preferences.service.ts",
"created_at": "2025-08-14T07:36:24+00:00",
"commented_code": "+import { HogFlowAction } from '../../../schema/hogflow'\n+import { CyclotronJobInvocationHogFlow } from '../../types'\n+import { RecipientsManagerService } from '../managers/recipients-manager.service'\n+\n+export class RecipientPreferencesService {\n+ constructor(private recipientsManager: RecipientsManagerService) {}\n+\n+ public async shouldSkipAction(invocation: CyclotronJobInvocationHogFlow, action: HogFlowAction): Promise<boolean> {\n+ return (\n+ this.isSubjectToRecipientPreferences(action) && (await this.isRecipientOptedOutOfAction(invocation, action))\n+ )\n+ }\n+\n+ private isSubjectToRecipientPreferences(\n+ action: HogFlowAction\n+ ): action is Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }> {\n+ return ['function_email', 'function_sms'].includes(action.type)\n+ }\n+\n+ private async isRecipientOptedOutOfAction(\n+ invocation: CyclotronJobInvocationHogFlow,\n+ action: Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }>\n+ ): Promise<boolean> {\n+ // Get the identifier to be used from the action config for sms, this is an input called to_number,\n+ // for email it is inside an input called email, specifically email.to.\n+ let identifier\n+\n+ if (action.type === 'function_sms') {\n+ identifier = action.config.inputs?.to_number\n+ } else if (action.type === 'function_email') {\n+ identifier = action.config.inputs?.email?.value?.to\n+ }\n+\n+ if (!identifier) {\n+ throw new Error(`No identifier found for message action ${action.id}`)\n+ }\n+\n+ try {\n+ const recipient = await this.recipientsManager.get({\n+ teamId: invocation.teamId,\n+ identifier: identifier,\n+ })\n+\n+ if (recipient) {\n+ // Grab the recipient preferences for the action category\n+ const categoryId = action.config.message_category_id || '$all'\n+\n+ const messageCategoryPreference = this.recipientsManager.getPreference(recipient, categoryId)\n+ const allMarketingPreferences = this.recipientsManager.getAllMarketingMessagingPreference(recipient)\n+\n+ /**\n+ * NB: A recipient may have opted out of all marketing messaging but NOT a specific category,\n+ * so we always check both.\n+ *\n+ * This would commonly happen if the recipient opted out before the category was created.\n+ */\n+ if (messageCategoryPreference === 'OPTED_OUT' || allMarketingPreferences === 'OPTED_OUT') {\n+ return true\n+ }\n+ }\n+\n+ return false",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2275746221",
"repo_full_name": "PostHog/posthog",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/messaging/recipient-preferences.service.ts",
"discussion_id": "2275746221",
"commented_code": "@@ -0,0 +1,69 @@\n+import { HogFlowAction } from '../../../schema/hogflow'\n+import { CyclotronJobInvocationHogFlow } from '../../types'\n+import { RecipientsManagerService } from '../managers/recipients-manager.service'\n+\n+export class RecipientPreferencesService {\n+ constructor(private recipientsManager: RecipientsManagerService) {}\n+\n+ public async shouldSkipAction(invocation: CyclotronJobInvocationHogFlow, action: HogFlowAction): Promise<boolean> {\n+ return (\n+ this.isSubjectToRecipientPreferences(action) && (await this.isRecipientOptedOutOfAction(invocation, action))\n+ )\n+ }\n+\n+ private isSubjectToRecipientPreferences(\n+ action: HogFlowAction\n+ ): action is Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }> {\n+ return ['function_email', 'function_sms'].includes(action.type)\n+ }\n+\n+ private async isRecipientOptedOutOfAction(\n+ invocation: CyclotronJobInvocationHogFlow,\n+ action: Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }>\n+ ): Promise<boolean> {\n+ // Get the identifier to be used from the action config for sms, this is an input called to_number,\n+ // for email it is inside an input called email, specifically email.to.\n+ let identifier\n+\n+ if (action.type === 'function_sms') {\n+ identifier = action.config.inputs?.to_number\n+ } else if (action.type === 'function_email') {\n+ identifier = action.config.inputs?.email?.value?.to\n+ }\n+\n+ if (!identifier) {\n+ throw new Error(`No identifier found for message action ${action.id}`)\n+ }\n+\n+ try {\n+ const recipient = await this.recipientsManager.get({\n+ teamId: invocation.teamId,\n+ identifier: identifier,\n+ })\n+\n+ if (recipient) {\n+ // Grab the recipient preferences for the action category\n+ const categoryId = action.config.message_category_id || '$all'\n+\n+ const messageCategoryPreference = this.recipientsManager.getPreference(recipient, categoryId)\n+ const allMarketingPreferences = this.recipientsManager.getAllMarketingMessagingPreference(recipient)\n+\n+ /**\n+ * NB: A recipient may have opted out of all marketing messaging but NOT a specific category,\n+ * so we always check both.\n+ *\n+ * This would commonly happen if the recipient opted out before the category was created.\n+ */\n+ if (messageCategoryPreference === 'OPTED_OUT' || allMarketingPreferences === 'OPTED_OUT') {\n+ return true\n+ }\n+ }\n+\n+ return false",
"comment_created_at": "2025-08-14T07:36:24+00:00",
"comment_author": "meikelmosby",
"comment_body": "so this means if we do not find an recipient we return `false`?",
"pr_file_module": null
},
{
"comment_id": "2283351466",
"repo_full_name": "PostHog/posthog",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/messaging/recipient-preferences.service.ts",
"discussion_id": "2275746221",
"commented_code": "@@ -0,0 +1,69 @@\n+import { HogFlowAction } from '../../../schema/hogflow'\n+import { CyclotronJobInvocationHogFlow } from '../../types'\n+import { RecipientsManagerService } from '../managers/recipients-manager.service'\n+\n+export class RecipientPreferencesService {\n+ constructor(private recipientsManager: RecipientsManagerService) {}\n+\n+ public async shouldSkipAction(invocation: CyclotronJobInvocationHogFlow, action: HogFlowAction): Promise<boolean> {\n+ return (\n+ this.isSubjectToRecipientPreferences(action) && (await this.isRecipientOptedOutOfAction(invocation, action))\n+ )\n+ }\n+\n+ private isSubjectToRecipientPreferences(\n+ action: HogFlowAction\n+ ): action is Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }> {\n+ return ['function_email', 'function_sms'].includes(action.type)\n+ }\n+\n+ private async isRecipientOptedOutOfAction(\n+ invocation: CyclotronJobInvocationHogFlow,\n+ action: Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }>\n+ ): Promise<boolean> {\n+ // Get the identifier to be used from the action config for sms, this is an input called to_number,\n+ // for email it is inside an input called email, specifically email.to.\n+ let identifier\n+\n+ if (action.type === 'function_sms') {\n+ identifier = action.config.inputs?.to_number\n+ } else if (action.type === 'function_email') {\n+ identifier = action.config.inputs?.email?.value?.to\n+ }\n+\n+ if (!identifier) {\n+ throw new Error(`No identifier found for message action ${action.id}`)\n+ }\n+\n+ try {\n+ const recipient = await this.recipientsManager.get({\n+ teamId: invocation.teamId,\n+ identifier: identifier,\n+ })\n+\n+ if (recipient) {\n+ // Grab the recipient preferences for the action category\n+ const categoryId = action.config.message_category_id || '$all'\n+\n+ const messageCategoryPreference = this.recipientsManager.getPreference(recipient, categoryId)\n+ const allMarketingPreferences = this.recipientsManager.getAllMarketingMessagingPreference(recipient)\n+\n+ /**\n+ * NB: A recipient may have opted out of all marketing messaging but NOT a specific category,\n+ * so we always check both.\n+ *\n+ * This would commonly happen if the recipient opted out before the category was created.\n+ */\n+ if (messageCategoryPreference === 'OPTED_OUT' || allMarketingPreferences === 'OPTED_OUT') {\n+ return true\n+ }\n+ }\n+\n+ return false",
"comment_created_at": "2025-08-18T20:09:42+00:00",
"comment_author": "havenbarnes",
"comment_body": "Hmm yes nice catch - I thought this was the correct behavior but actually it should be `true`. I'll leave a comment saying so, but if someone's never given their preference to PostHog, we can assume they've opted in to messaging via the PostHog user's app / TOS",
"pr_file_module": null
},
{
"comment_id": "2283638210",
"repo_full_name": "PostHog/posthog",
"pr_number": 36271,
"pr_file": "plugin-server/src/cdp/services/messaging/recipient-preferences.service.ts",
"discussion_id": "2275746221",
"commented_code": "@@ -0,0 +1,69 @@\n+import { HogFlowAction } from '../../../schema/hogflow'\n+import { CyclotronJobInvocationHogFlow } from '../../types'\n+import { RecipientsManagerService } from '../managers/recipients-manager.service'\n+\n+export class RecipientPreferencesService {\n+ constructor(private recipientsManager: RecipientsManagerService) {}\n+\n+ public async shouldSkipAction(invocation: CyclotronJobInvocationHogFlow, action: HogFlowAction): Promise<boolean> {\n+ return (\n+ this.isSubjectToRecipientPreferences(action) && (await this.isRecipientOptedOutOfAction(invocation, action))\n+ )\n+ }\n+\n+ private isSubjectToRecipientPreferences(\n+ action: HogFlowAction\n+ ): action is Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }> {\n+ return ['function_email', 'function_sms'].includes(action.type)\n+ }\n+\n+ private async isRecipientOptedOutOfAction(\n+ invocation: CyclotronJobInvocationHogFlow,\n+ action: Extract<HogFlowAction, { type: 'function_email' | 'function_sms' }>\n+ ): Promise<boolean> {\n+ // Get the identifier to be used from the action config for sms, this is an input called to_number,\n+ // for email it is inside an input called email, specifically email.to.\n+ let identifier\n+\n+ if (action.type === 'function_sms') {\n+ identifier = action.config.inputs?.to_number\n+ } else if (action.type === 'function_email') {\n+ identifier = action.config.inputs?.email?.value?.to\n+ }\n+\n+ if (!identifier) {\n+ throw new Error(`No identifier found for message action ${action.id}`)\n+ }\n+\n+ try {\n+ const recipient = await this.recipientsManager.get({\n+ teamId: invocation.teamId,\n+ identifier: identifier,\n+ })\n+\n+ if (recipient) {\n+ // Grab the recipient preferences for the action category\n+ const categoryId = action.config.message_category_id || '$all'\n+\n+ const messageCategoryPreference = this.recipientsManager.getPreference(recipient, categoryId)\n+ const allMarketingPreferences = this.recipientsManager.getAllMarketingMessagingPreference(recipient)\n+\n+ /**\n+ * NB: A recipient may have opted out of all marketing messaging but NOT a specific category,\n+ * so we always check both.\n+ *\n+ * This would commonly happen if the recipient opted out before the category was created.\n+ */\n+ if (messageCategoryPreference === 'OPTED_OUT' || allMarketingPreferences === 'OPTED_OUT') {\n+ return true\n+ }\n+ }\n+\n+ return false",
"comment_created_at": "2025-08-18T22:34:51+00:00",
"comment_author": "havenbarnes",
"comment_body": "Oops no, the `false` is correct. Forgot this is inside `isRecipientOptedOutOfAction`. I'll refactor this and add a comment still",
"pr_file_module": null
}
]
},
{
"discussion_id": "2260699270",
"pr_number": 35926,
"pr_file": "plugin-server/src/worker/ingestion/persons/repositories/postgres-person-repository.ts",
"created_at": "2025-08-07T15:32:54+00:00",
"commented_code": "}\n }\n \n+ if (this.isPropertiesSizeConstraintViolation(error)) {\n+ // For createPerson, we just log and reject since there's no existing person to update\n+ personPropertiesSizeViolationCounter.inc({\n+ violation_type: 'create_person_size_violation',\n+ })\n+\n+ logger.warn('Rejecting person properties create/update, exceeds size limit', {\n+ team_id: teamId,\n+ person_id: undefined,\n+ violation_type: 'create_person_size_violation',\n+ })\n+\n+ throw new PersonPropertiesSizeViolationError(\n+ `Person properties create would exceed size limit`,\n+ teamId,\n+ undefined\n+ )",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2260699270",
"repo_full_name": "PostHog/posthog",
"pr_number": 35926,
"pr_file": "plugin-server/src/worker/ingestion/persons/repositories/postgres-person-repository.ts",
"discussion_id": "2260699270",
"commented_code": "@@ -213,6 +423,25 @@ export class PostgresPersonRepository\n }\n }\n \n+ if (this.isPropertiesSizeConstraintViolation(error)) {\n+ // For createPerson, we just log and reject since there's no existing person to update\n+ personPropertiesSizeViolationCounter.inc({\n+ violation_type: 'create_person_size_violation',\n+ })\n+\n+ logger.warn('Rejecting person properties create/update, exceeds size limit', {\n+ team_id: teamId,\n+ person_id: undefined,\n+ violation_type: 'create_person_size_violation',\n+ })\n+\n+ throw new PersonPropertiesSizeViolationError(\n+ `Person properties create would exceed size limit`,\n+ teamId,\n+ undefined\n+ )",
"comment_created_at": "2025-08-07T15:32:54+00:00",
"comment_author": "pl",
"comment_body": "question: Are we failing gracefully in this case? I could not find the code that handles the size violation exceptions when creating a person, but I might be looking wrong. I think it would be good to test it at the `person-*-service` level. ",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,39 @@
---
title: Verify error handling paths
description: When implementing error handling logic, ensure that both the behavior
and reasoning are clear, and that error paths are properly tested at appropriate
levels. For methods that handle missing data, the default behavior should be explicitly
documented and the method name should clearly indicate what the return value represents.
For exception handling, verify...
repository: PostHog/posthog
label: Error Handling
language: TypeScript
comments_count: 2
repository_stars: 28460
---
When implementing error handling logic, ensure that both the behavior and reasoning are clear, and that error paths are properly tested at appropriate levels. For methods that handle missing data, the default behavior should be explicitly documented and the method name should clearly indicate what the return value represents. For exception handling, verify that upstream code properly catches and handles the exceptions to ensure graceful failure.
Example from the discussions:
```typescript
// Good: Clear method name and documented default behavior
private async isRecipientOptedOutOfAction(invocation, action): Promise<boolean> {
// ... logic to find recipient
if (!recipient) {
// Default to opted-in if no preference exists (per TOS)
return false; // false = not opted out = can send message
}
// ... rest of logic
}
// When throwing exceptions, ensure upstream handling exists
if (this.isPropertiesSizeConstraintViolation(error)) {
logger.warn('Rejecting person properties create/update, exceeds size limit', {
team_id: teamId,
violation_type: 'create_person_size_violation',
});
throw new PersonPropertiesSizeViolationError(/* ... */);
}
```
Always verify that exception handling is tested at the service level to ensure the application fails gracefully rather than crashing unexpectedly.

View File

@@ -0,0 +1,24 @@
[
{
"discussion_id": "2261639125",
"pr_number": 36339,
"pr_file": "posthog/templates/email/personal_api_key_exposed.html",
"created_at": "2025-08-08T01:36:39+00:00",
"commented_code": "+{% extends \"email/base.html\" %} {% load posthog_assets %} {% load posthog_filters %}\n+{% block heading %}Personal API Key has been deactivated{% endblock %}\n+{% block section %}\n+<p>\n+ Your Personal API Key <strong>{{ label }}</strong> with value <strong>{{ mask_value }}</strong> was publicly exposed.\n+ {% if more_info %}{{ more_info }}{% endif %}",
"repo_full_name": "PostHog/posthog",
"discussion_comments": [
{
"comment_id": "2261754566",
"repo_full_name": "PostHog/posthog",
"pr_number": 36339,
"pr_file": "posthog/templates/email/personal_api_key_exposed.html",
"discussion_id": "2261639125",
"commented_code": "@@ -0,0 +1,21 @@\n+{% extends \"email/base.html\" %} {% load posthog_assets %} {% load posthog_filters %}\n+{% block heading %}Personal API Key has been deactivated{% endblock %}\n+{% block section %}\n+<p>\n+ Your Personal API Key <strong>{{ label }}</strong> with value <strong>{{ mask_value }}</strong> was publicly exposed.\n+ {% if more_info %}{{ more_info }}{% endif %}",
"comment_created_at": "2025-08-08T01:36:39+00:00",
"comment_author": "Piccirello",
"comment_body": "Django templates escape HTML by default when `{{ }}` is used. Also verified that supplying a `more_info` value of `<img src=x />` results in the text being printed, rather than rendered as html.",
"pr_file_module": null
}
]
}
]

View File

@@ -0,0 +1,28 @@
---
title: Verify HTML escaping
description: Always verify that user-controlled content in templates is properly HTML-escaped
to prevent XSS attacks. Don't just assume framework defaults are working - actively
test with potentially malicious input to confirm that HTML tags are rendered as
text rather than executed.
repository: PostHog/posthog
label: Security
language: Html
comments_count: 1
repository_stars: 28460
---
Always verify that user-controlled content in templates is properly HTML-escaped to prevent XSS attacks. Don't just assume framework defaults are working - actively test with potentially malicious input to confirm that HTML tags are rendered as text rather than executed.
When displaying dynamic content in templates, test with HTML payloads like `<img src=x />` or `<script>alert('xss')</script>` to ensure they appear as literal text. For Django templates, confirm that the standard `{{ variable }}` syntax properly escapes HTML characters, converting `<` to `&lt;`, `>` to `&gt;`, etc.
Example verification:
```html
<!-- Template: -->
<p>API Key: <strong>{{ more_info }}</strong></p>
<!-- Test input: more_info = "<img src=x />" -->
<!-- Expected output: API Key: <strong>&lt;img src=x /&gt;</strong> -->
<!-- NOT: API Key: <strong><img src=x /></strong> -->
```
This practice helps catch cases where unsafe rendering methods might be accidentally used or where framework protections might not apply.