The Config.Validate() method was not calling CloneConfig.Validate(),
which could allow invalid clone detection configurations to pass through.
This fix ensures proper validation of all clone-related settings including
thresholds, filtering, and analysis parameters.
Update ARCHITECTURE.md to clearly indicate that caching mechanisms
(AST, CFG, LSH caching) are planned for future releases and not yet
implemented in v1.0.0. This ensures documentation accurately reflects
the current implementation state.
**Problem:**
CBO analysis was only counting High Risk classes (CBO > 7) for penalty
calculation, completely ignoring Medium Risk classes (3 < CBO ≤ 7).
Example: Project with 364 classes, multiple classes at CBO=6, but score = 100/100
because all were "Medium Risk" (not counted).
**Solution:**
1. Added `MediumCouplingClasses` field to track Medium Risk classes
2. Updated penalty calculation to use weighted ratio:
- High Risk classes: weight = 1.0
- Medium Risk classes: weight = 0.5
Formula: (HighRisk × 1.0 + MediumRisk × 0.5) / TotalClasses
**Impact:**
Projects with many Medium Risk classes will now receive appropriate penalties:
- 10% weighted ratio → -6 points (Low penalty)
- 30% weighted ratio → -12 points (Medium penalty)
- 60% weighted ratio → -20 points (High penalty)
This makes CBO scoring more realistic and catches coupling issues that were
previously ignored.
**Files Changed:**
- domain/analyze.go: Added MediumCouplingClasses field, updated penalty logic
- app/analyze_usecase.go: Set MediumCouplingClasses from CBO analysis
- domain/analyze.go: Added validation for new field
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updated CBO (Coupling Between Objects) analysis to use industry-standard
thresholds for more accurate code quality assessment:
**CBO Risk Thresholds:**
- Low Risk: CBO ≤ 3 (was ≤ 5)
- Medium Risk: 3 < CBO ≤ 7 (was 5 < CBO ≤ 10)
- High Risk: CBO > 7 (was > 10)
**Coupling Ratio Thresholds:**
- Low: 5% of classes (was 10%)
- Medium: 15% of classes (was 30%)
- High: 30% of classes (was 50%)
**Coupling Penalties:**
- Low: 6 points (was 5)
- Medium: 12 points (was 10)
- High: 20 points (was 16, now aligned with other high penalties)
These changes make coupling detection more strict and aligned with industry
best practices, helping identify maintainability issues earlier.
**Files Updated:**
- domain/analyze.go: Updated coupling ratio thresholds and penalties
- domain/cbo.go: Updated default CBO thresholds with industry standard comments
- internal/analyzer/cbo.go: Updated default CBO options
- domain/analyze_test.go: Updated test expectations for new thresholds
- internal/analyzer/cbo_test.go: Updated test expectations
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed two test cases that had incorrect expected values after updating
the duplication scoring thresholds:
1. "high_complexity" test:
- CodeDuplication 5.0% falls in 3-10% range = Low penalty (6), not Medium (12)
- Correct score: 100-20-6 = 74 (Grade B), not 68 (Grade C)
2. "grade_C_threshold" test:
- Expected score should be 58, not 59
- Calculation: 100-12-20-10 = 58
Also fixed formatting with gofmt.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
LSH (Locality-Sensitive Hashing) mode was producing inflated similarity
scores compared to APTED mode, resulting in false positive clone detection.
## Root Cause
MinHash Jaccard similarity tends to be more lenient than APTED tree edit
distance similarity due to:
- Coarse feature extraction (maxSubtreeHeight: 3, kGramSize: 4)
- Set-based similarity ignoring structural differences
- Common patterns (If, For, Assign) creating false overlaps
## Solution
Increased LSHSimilarityThreshold from 0.78 to 0.88 across all layers:
- internal/analyzer/clone_detector.go:216
- domain/clone.go:350
- internal/config/clone_config.go:200
- cmd/pyscn/init.go:84 (template)
The new threshold (0.88) is Type2Threshold (0.85) + 0.03, ensuring:
- More strict candidate filtering in LSH stage
- Reduced false positives
- Final APTED verification still provides accurate similarity
## Impact
- Fragments >= 500: LSH mode now produces comparable results to APTED
- Better precision without sacrificing recall
- Maintains performance benefits of LSH acceleration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When running `analyze` command with clone detection on large codebases
(500+ fragments), LSH auto-detection messages were being printed to
stderr, interrupting the unified progress bar and causing it to appear
twice.
This change removes all LSH-related diagnostic messages since:
- LSH is an internal performance optimization that users don't need to
be aware of
- The messages interfere with progress bar display
- LSH functionality continues to work automatically based on fragment
count threshold
Before:
```
Analyzing 0% | | (0/100, 0 it/hr) [0s:0s]LSH: Auto-detection - 920 fragments, threshold=500, enabled=true
Analyzing 100% |██████████████████████████████████████████████| (100/100, 22 it/s)
```
After:
```
Analyzing 100% |██████████████████████████████████████████████| (100/100, 22 it/s)
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add detailed descriptions for each grouping strategy
- Explain characteristics: simple/fast, high quality, balanced
- Keep default as 'connected' (simple and fast, good for most cases)
- Clarify k_core_k parameter meaning (minimum connections per node)
- Simplify clone detection by always calling DetectClonesWithLSH()
- DetectClonesWithLSH() internally checks UseLSH and delegates to DetectClonesWithContext() when disabled
- Eliminates duplicate conditional logic between service and analyzer layers
- Improves separation of concerns and code maintainability
- Add SetBatchSizeLarge() method to CloneDetector for controlled access
- Update clone_memory_test.go to use setter instead of direct field access
- Maintains encapsulation of cloneDetectorConfig struct
Address issues found in code review:
1. Fix encapsulation violation:
- Change CloneDetectorConfig embedding from public to private
- Add SetUseLSH() method for controlled access
- Update service layer to use SetUseLSH() instead of direct field access
2. Remove dead code:
- Delete unused MemoryLimit field from CloneDetectorConfig
- Remove from type definition and default configuration
This prevents external packages from directly accessing internal
configuration fields while maintaining all functionality.
All tests pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove indirection and deprecated code related to CloneDetectorConfig:
- Embed CloneDetectorConfig directly into CloneDetector struct
- Replace all cd.config.X with cd.X (44 occurrences)
- Remove unused NewCloneDetectorFromConfig() function
- Remove DEPRECATED comments from CloneDetectorConfig and NewCloneDetector
- Update service layer to use detector.UseLSH directly
- Clean up unnecessary comments in clone_adapters.go
This simplifies the architecture while maintaining all functionality.
All tests pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
LSH (Locality-Sensitive Hashing) acceleration was not working because:
1. LSH settings in [clones] section of .pyscn.toml were not being loaded
2. toml_loader.go expected nested [lsh] section, but config used flat structure
3. cloneConfigToCloneRequest() was not converting LSH settings to CloneRequest
4. Auto-enable logic based on fragment count was not implemented
This caused clone detection to always use slow APTED algorithm, even for
large projects where LSH would provide significant speedup.
## Solution
1. Added ClonesConfig struct to read flat [clones] section structure
2. Implemented mergeClonesSection() to load all settings including LSH
3. Extended CloneRequest with LSH fields (LSHEnabled, LSHAutoThreshold, etc.)
4. Added auto-enable logic in clone_service.go:
- "auto": enable LSH when fragments >= threshold (default: 500)
- "true": always enable LSH
- "false": always disable LSH
5. Added diagnostic messages showing LSH decision
## Changes
- domain/clone.go: Add LSH config fields to CloneRequest
- internal/config/toml_loader.go: Add ClonesConfig struct and merge logic
- service/clone_config_loader.go: Convert LSH settings to CloneRequest
- service/clone_service.go: Implement auto-enable logic based on fragment count
- .pyscn.toml: Document LSH settings (no functional change)
## Testing
- Verified LSH auto-detection with different thresholds
- Confirmed settings load correctly from .pyscn.toml
- All existing tests pass
## Related
- Fixes issue discovered during performance investigation
- Prepares for config refactoring in #124🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove 'recommended' label from pyproject.toml
- Clarify that .pyscn.toml takes precedence over pyproject.toml
- Add explicit note about behavior when both files exist
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Align with Ruff's behavior where dedicated config files take precedence
over pyproject.toml. This makes the configuration priority more intuitive:
when users explicitly create a .pyscn.toml file, it should override any
settings in pyproject.toml.
Changes:
- Update TomlConfigLoader.LoadConfig() to check .pyscn.toml first
- Update GetSupportedConfigFiles() to reflect new priority order
- Update documentation in ARCHITECTURE.md to reflect new behavior
This is a breaking change for projects that have both .pyscn.toml and
pyproject.toml with different settings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove all commented-out progress-related code following YAGNI principle.
The new unified time-based progress tracking makes these legacy comments
obsolete and they add no value to code maintainability.
Removed unnecessary progressManager nil check since progressDone is
only non-nil when progressManager is already non-nil.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplified ProgressManager API by removing taskName parameter that was
only used for display text, which is now hardcoded to "Analyzing".
Changes:
- Renamed methods: StartTask → Start, CompleteTask → Complete,
UpdateProgress → Update
- Removed taskName parameter from all methods
- Hardcoded progress bar description to "Analyzing"
- Updated domain.ProgressManager interface to reflect simpler API
Rationale:
- Single task model doesn't need task identification
- "Analysis" string was scattered across 4 locations with no value
- Simpler API: Start() instead of StartTask("Analysis")
Benefits:
- Cleaner API surface
- Removed string constant duplication
- More intuitive method names (Start vs StartTask)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed unnecessary multi-task management infrastructure that was never
used, reducing code complexity by ~30%.
Changes:
- Removed tasks map and TaskProgress struct (multi-task support)
- Removed GetTaskStatus() method (never used)
- Removed totalFiles field (redundant with maxValue)
- Simplified to single progressBar field
- Moved ProgressBar creation from StartTask to UpdateProgress
Benefits:
- Code reduction: 206 lines → 145 lines (-61 lines, -30%)
- Simpler mental model: single ProgressBar instead of map
- No functionality loss: only "Analysis" task was ever used
- Easier to maintain and understand
YAGNI applied: Removed speculative multi-task feature that added
complexity without providing value in the current implementation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replaced multiple task-specific progress bars with a single unified
progress bar that shows overall analysis progress.
Changes:
- Added calculateEstimatedTime() to estimate total analysis time based
on file count and enabled analyses
- Implemented startFakeProgressUpdater() that updates progress based on
elapsed time rather than actual file processing
- Used conservative time estimates: O(n) for fast analyses, O(n²) for
clone detection
- Single "Analysis" progress bar replaces individual task progress bars
- Progress updates every 100ms, capping at 99% until completion
Benefits:
- Provides meaningful progress feedback during long-running analyses
- Works with --select flag to adjust estimates for selected analyses
- Conservative estimates (7min estimated vs 3min44s actual for 207 files)
provide better UX than overly optimistic estimates
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Previously, the dead code penalty was calculated by double-counting:
- Base penalty: all dead code issues (including critical)
- Critical penalty: critical issues counted again with 3x weight
This resulted in overly harsh scoring (e.g., 0/100 for 15 critical issues).
Changes:
- Remove base penalty calculation entirely
- Calculate penalty based only on critical dead code issues
- Remove critical issue weight multiplier (was 3x, now 1x)
- Simplify logic: no double-counting, no cap checking needed
Impact:
- Test case with 15 critical issues:
- Before: 0/100 (penalty capped at 20)
- After: 35/100 (penalty of 13 points)
- Overall health score improves appropriately
- Scoring is now more intuitive and fair
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replace hardcoded score quality thresholds (85, 70, 55) with named constants
to improve code maintainability and consistency.
Changes:
- Add ScoreThresholdExcellent, ScoreThresholdGood, ScoreThresholdFair constants to domain layer
- Update getScoreIcon() in cmd/pyscn/analyze.go to use constants
- Add scoreQuality() template helper function in service/analyze_formatter.go
- Replace all 12 instances of magic numbers in HTML template with scoreQuality() calls
Benefits:
- Single source of truth for score thresholds
- Easier to modify thresholds in the future
- Improved code readability
- Guaranteed consistency between CLI and HTML output
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove getScoreLabel function that was not being used anywhere in the codebase.
This fixes the golangci-lint unused function error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add detailed scoring system that shows scores for each quality category
(Complexity, Dead Code, Duplication, Coupling, Dependencies, Architecture)
in addition to the overall health score.
## Changes
### Domain Layer
- Add 6 individual score fields to AnalyzeSummary struct
- Extract penalty calculation into separate methods for each category
- Add penaltyToScore() helper function to convert penalties to 0-100 scores
- Update CalculateHealthScore() to calculate and set individual scores
- Add test cases for individual score validation
### CLI Output
- Add "Detailed Scores" section showing each category score with icons
- Display scores with color-coded icons (✅👍⚠️❌)
- Include relevant metrics for each category (avg, count, etc.)
### HTML Output
- Add Quality Scores section in Summary tab with visual progress bars
- Color-coded bars: green (85+), yellow-green (70-84), orange (55-69), red (<55)
- Slim 12px progress bars for modern, clean appearance
- Add score badges to each analysis tab header
- Tab headers now display title and score together in a unified design
### JSON/YAML/CSV Output
- All formats automatically include new score fields
## Benefits
- Users can quickly identify which quality aspects need improvement
- Visual representation makes scores easier to understand
- Consistent scoring across all output formats
- Individual scores use same penalty calculation as overall health score
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update README: position as code quality analyzer for Python vibe coders
- Update CHANGELOG: add v1.0.0 stable release entry with key features
- Remove ROADMAP.md: clean up internal planning docs for public release
- Emphasize AI-assisted development context (Cursor, Claude, ChatGPT)
Co-authored-by: DaisukeYoda <daisukeyoda@users.noreply.github.com>
Fix critical issues in CFG builder for more accurate dead code detection:
1. Fix blockTerminates() to check only last statement
- Previously checked all statements, causing false positives
- Now only checks the final statement to avoid detecting terminators
in nested control flow (e.g., return inside if-statement within block)
- Prevents incorrect dead code detection in complex scenarios
2. Replace error logging with panic for parser bugs
- Changed from logging to fail-fast panic when elif_clause has missing fields
- Parser bugs are programming errors, not user errors, so immediate failure
with detailed error message is appropriate
- Improves debugging by failing at the exact point of error
These changes improve the accuracy and debuggability of dead code analysis
without breaking any existing functionality.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes#117
## Problem
Dead code detection failed to identify unreachable code after exhaustive
if-elif-else chains where all branches terminate with return/raise/break/
continue statements.
## Root Cause
1. Parser wasn't properly handling elif_clause and else_clause nodes from
tree-sitter, causing else clauses to be lost in elif chains
2. CFG builder always created fall-through edges to merge blocks even
when all conditional branches terminated
## Solution
### Parser Fixes (ast_builder.go)
- Added buildElifClause() to properly parse elif nodes with Test/Body/Orelse
- Added buildElseClause() to parse else nodes with Body
- Modified buildIfStatement() to collect all alternative branches and chain
them correctly via attachElseToElifChain()
- Tree-sitter returns both elif_clause and else_clause as "alternative"
field siblings, now both are properly collected and linked
### CFG Builder Enhancements (cfg_builder.go)
- Added blockTerminates() to check if block ends with terminating statements
- Added allBranchesTerminate() to check if all conditional branches terminate
- Modified processIfStatement() to detect exhaustive termination and create
unreachable blocks instead of connecting to merge
- Modified processIfStatementElif() with same termination detection
- Updated convertElifClauseToIf() to use parser-populated fields
- Both functions now handle else_clause nodes by extracting their Body
### Test Updates (dead_code_test.go)
- Updated UnreachableElif: expectedDead 0 → 1
- Added ExhaustiveIfElseReturn test case
- Added NestedExhaustiveReturn test case
- Added MixedTerminators test case (return/raise mix)
- Updated ComplexControlFlow: expectedDead 3 → 4 (more accurate)
## Testing
✅ All 18 dead code detection tests passing
✅ All analyzer tests passing (16.8s)
✅ All parser tests passing (0.26s)
## Pattern
Follows existing unreachable block creation pattern used for return/break/
continue statements (lines 323-324, 738-740, 759-761 in cfg_builder.go).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>