pyscn

mirror of https://github.com/ludo-technologies/pyscn.git synced 2025-10-06 00:59:45 +03:00

Author	SHA1	Message	Date
DaisukeYoda	1ccdfe7bb7	fix: add missing CloneConfig validation call The Config.Validate() method was not calling CloneConfig.Validate(), which could allow invalid clone detection configurations to pass through. This fix ensures proper validation of all clone-related settings including thresholds, filtering, and analysis parameters.	2025-10-05 21:10:08 +09:00
DaisukeYoda	182bac8b71	feat: tighten CBO coupling thresholds to industry standards Updated CBO (Coupling Between Objects) analysis to use industry-standard thresholds for more accurate code quality assessment: CBO Risk Thresholds: - Low Risk: CBO ≤ 3 (was ≤ 5) - Medium Risk: 3 < CBO ≤ 7 (was 5 < CBO ≤ 10) - High Risk: CBO > 7 (was > 10) Coupling Ratio Thresholds: - Low: 5% of classes (was 10%) - Medium: 15% of classes (was 30%) - High: 30% of classes (was 50%) Coupling Penalties: - Low: 6 points (was 5) - Medium: 12 points (was 10) - High: 20 points (was 16, now aligned with other high penalties) These changes make coupling detection more strict and aligned with industry best practices, helping identify maintainability issues earlier. Files Updated: - domain/analyze.go: Updated coupling ratio thresholds and penalties - domain/cbo.go: Updated default CBO thresholds with industry standard comments - internal/analyzer/cbo.go: Updated default CBO options - domain/analyze_test.go: Updated test expectations for new thresholds - internal/analyzer/cbo_test.go: Updated test expectations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 17:27:38 +09:00
DaisukeYoda	a6cfdbeb24	fix: unify LSH threshold to 0.50 across all defaults and update tests ## Changes 1. Unified LSH Similarity Threshold to 0.50 - domain/clone.go: 0.88 → 0.50 - internal/analyzer/clone_detector.go: 0.88 → 0.50 - internal/config/clone_config.go: 0.88 → 0.50 - cmd/pyscn/init.go: 0.88 → 0.50 - .pyscn.toml: already 0.50 2. Updated Test Cases for New Duplication Thresholds - Adjusted expected scores for changed penalty calculations - Updated grade expectations (Low=3%, Med=10%, High=20%) ## Impact All default values now consistently use 0.50, ensuring: - Consistent behavior regardless of config file presence - More recall in clone detection without sacrificing precision - APTED verification still provides final accuracy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 17:03:55 +09:00
DaisukeYoda	86cdb2a2be	fix: increase LSH similarity threshold from 0.78 to 0.88 ## Problem LSH (Locality-Sensitive Hashing) mode was producing inflated similarity scores compared to APTED mode, resulting in false positive clone detection. ## Root Cause MinHash Jaccard similarity tends to be more lenient than APTED tree edit distance similarity due to: - Coarse feature extraction (maxSubtreeHeight: 3, kGramSize: 4) - Set-based similarity ignoring structural differences - Common patterns (If, For, Assign) creating false overlaps ## Solution Increased LSHSimilarityThreshold from 0.78 to 0.88 across all layers: - internal/analyzer/clone_detector.go:216 - domain/clone.go:350 - internal/config/clone_config.go:200 - cmd/pyscn/init.go:84 (template) The new threshold (0.88) is Type2Threshold (0.85) + 0.03, ensuring: - More strict candidate filtering in LSH stage - Reduced false positives - Final APTED verification still provides accurate similarity ## Impact - Fragments >= 500: LSH mode now produces comparable results to APTED - Better precision without sacrificing recall - Maintains performance benefits of LSH acceleration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 16:29:21 +09:00
DaisukeYoda	9d2ffe8b07	fix: improve test encapsulation with SetBatchSizeLarge method - Add SetBatchSizeLarge() method to CloneDetector for controlled access - Update clone_memory_test.go to use setter instead of direct field access - Maintains encapsulation of cloneDetectorConfig struct	2025-10-05 15:31:41 +09:00
DaisukeYoda	f6593fb405	chore: fix lint formatting issues	2025-10-05 15:02:46 +09:00
DaisukeYoda	36c7c4a9e5	fix: improve encapsulation and remove dead code Address issues found in code review: 1. Fix encapsulation violation: - Change CloneDetectorConfig embedding from public to private - Add SetUseLSH() method for controlled access - Update service layer to use SetUseLSH() instead of direct field access 2. Remove dead code: - Delete unused MemoryLimit field from CloneDetectorConfig - Remove from type definition and default configuration This prevents external packages from directly accessing internal configuration fields while maintaining all functionality. All tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 14:50:14 +09:00
DaisukeYoda	5e263c39f2	refactor: simplify CloneDetector by embedding config directly Remove indirection and deprecated code related to CloneDetectorConfig: - Embed CloneDetectorConfig directly into CloneDetector struct - Replace all cd.config.X with cd.X (44 occurrences) - Remove unused NewCloneDetectorFromConfig() function - Remove DEPRECATED comments from CloneDetectorConfig and NewCloneDetector - Update service layer to use detector.UseLSH directly - Clean up unnecessary comments in clone_adapters.go This simplifies the architecture while maintaining all functionality. All tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 14:30:35 +09:00
DaisukeYoda	29a6498beb	refactor: unify config architecture and remove technical debt (issue #124 ) This commit implements a major cleanup of the configuration system, removing redundant structures and unifying the config architecture for the v1.0 release. ## Configuration Unification ### Unified Config Format - Migrate pyproject.toml from nested to flat structure - Before: [tool.pyscn.clone.analysis], [tool.pyscn.clone.thresholds], etc. - After: [tool.pyscn.clones] with all fields flattened - Both .pyscn.toml and pyproject.toml now use identical flat structure - Update init.go template to use unified [clones] section format ### Removed Duplicate Structures 1. pyproject_loader.go (-200 lines) - Removed: PyprojectCloneConfig and 4 sub-structs - Unified: PyscnConfig.Clones now uses ClonesConfig directly - Removed: mergePyprojectConfigs, use shared mergeClonesSection 2. toml_loader.go (-150 lines) - Removed: PyscnTomlAnalysisConfig, PyscnTomlFilteringConfig, etc. - Simplified: PyscnTomlConfig.Clones only - Merged: Shared mergeClonesSection for both loaders 3. config.go (-170 lines) - Removed: Config.CloneDetection field (deprecated) - Removed: CloneDetectionConfig type and all methods - Removed: validateCloneDetectionConfig and helper methods - Updated: clone_config_loader.go to use unified Clones field ### Removed Legacy Code 1. domain/clone.go - Removed: CloneRequest.UseLSH field (deprecated) - Removed: Related validation logic 2. analyzer/lsh_index.go - Removed: Duplicate LSHConfig struct (unused) - Use: config.LSHConfig as single source of truth 3. constants/clone_thresholds.go (-60 lines) - Removed: CloneThresholdConfig struct - Removed: DefaultCloneThresholds() function - Removed: ValidateThresholds() and GetThresholdForType() methods - Inlined: Validation logic into config.ThresholdConfig.Validate() 4. service/dead_code_formatter.go - Removed: formatFindingTextLegacy method - Inlined: Formatting logic into FormatFinding ### Test Updates - Updated clone_adapters_test.go: removed tests for deleted types - Simplified clone_thresholds_test.go: removed validation tests - Updated new_commands_test.go: verify [clones] section format - All tests pass: 100% success rate ## Impact ### Code Reduction - Deleted lines: ~1,056 lines - Deleted structs: 12 config structs - Deleted functions: 8 functions/methods - Deleted tests: 6 obsolete test functions ### Architecture Improvements - Single source of truth for clone configuration - No more duplicate config structs across layers - Consistent config format between .pyscn.toml and pyproject.toml - Cleaner separation of concerns ### Remaining Technical Debt Documented for future cleanup: - analyzer.GroupingConfig (design issue with Type1-4Threshold fields) - analyzer.CloneDetectorConfig (used in 21 locations, needs planned migration) - ClonesConfig flat structure (TOML-specific, needs careful analysis) Closes #124 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 13:48:17 +09:00
DaisukeYoda	927d472aaf	chore: fix lint formatting issues	2025-10-05 12:35:22 +09:00
DaisukeYoda	d6a6fc7e0a	fix: implement LSH auto-enable from config file ## Problem LSH (Locality-Sensitive Hashing) acceleration was not working because: 1. LSH settings in [clones] section of .pyscn.toml were not being loaded 2. toml_loader.go expected nested [lsh] section, but config used flat structure 3. cloneConfigToCloneRequest() was not converting LSH settings to CloneRequest 4. Auto-enable logic based on fragment count was not implemented This caused clone detection to always use slow APTED algorithm, even for large projects where LSH would provide significant speedup. ## Solution 1. Added ClonesConfig struct to read flat [clones] section structure 2. Implemented mergeClonesSection() to load all settings including LSH 3. Extended CloneRequest with LSH fields (LSHEnabled, LSHAutoThreshold, etc.) 4. Added auto-enable logic in clone_service.go: - "auto": enable LSH when fragments >= threshold (default: 500) - "true": always enable LSH - "false": always disable LSH 5. Added diagnostic messages showing LSH decision ## Changes - domain/clone.go: Add LSH config fields to CloneRequest - internal/config/toml_loader.go: Add ClonesConfig struct and merge logic - service/clone_config_loader.go: Convert LSH settings to CloneRequest - service/clone_service.go: Implement auto-enable logic based on fragment count - .pyscn.toml: Document LSH settings (no functional change) ## Testing - Verified LSH auto-detection with different thresholds - Confirmed settings load correctly from .pyscn.toml - All existing tests pass ## Related - Fixes issue discovered during performance investigation - Prepares for config refactoring in #124 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 12:31:06 +09:00
DaisukeYoda	a65a5a6f4c	refactor: prioritize .pyscn.toml over pyproject.toml for configuration Align with Ruff's behavior where dedicated config files take precedence over pyproject.toml. This makes the configuration priority more intuitive: when users explicitly create a .pyscn.toml file, it should override any settings in pyproject.toml. Changes: - Update TomlConfigLoader.LoadConfig() to check .pyscn.toml first - Update GetSupportedConfigFiles() to reflect new priority order - Update documentation in ARCHITECTURE.md to reflect new behavior This is a breaking change for projects that have both .pyscn.toml and pyproject.toml with different settings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-05 10:20:54 +09:00
DaisukeYoda	0fb2821765	refactor: improve dead code detection robustness Fix critical issues in CFG builder for more accurate dead code detection: 1. Fix blockTerminates() to check only last statement - Previously checked all statements, causing false positives - Now only checks the final statement to avoid detecting terminators in nested control flow (e.g., return inside if-statement within block) - Prevents incorrect dead code detection in complex scenarios 2. Replace error logging with panic for parser bugs - Changed from logging to fail-fast panic when elif_clause has missing fields - Parser bugs are programming errors, not user errors, so immediate failure with detailed error message is appropriate - Improves debugging by failing at the exact point of error These changes improve the accuracy and debuggability of dead code analysis without breaking any existing functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 10:09:51 +09:00
DaisukeYoda	01f3bfb764	fix: detect dead code after exhaustive if-elif-else chains Fixes #117 ## Problem Dead code detection failed to identify unreachable code after exhaustive if-elif-else chains where all branches terminate with return/raise/break/ continue statements. ## Root Cause 1. Parser wasn't properly handling elif_clause and else_clause nodes from tree-sitter, causing else clauses to be lost in elif chains 2. CFG builder always created fall-through edges to merge blocks even when all conditional branches terminated ## Solution ### Parser Fixes (ast_builder.go) - Added buildElifClause() to properly parse elif nodes with Test/Body/Orelse - Added buildElseClause() to parse else nodes with Body - Modified buildIfStatement() to collect all alternative branches and chain them correctly via attachElseToElifChain() - Tree-sitter returns both elif_clause and else_clause as "alternative" field siblings, now both are properly collected and linked ### CFG Builder Enhancements (cfg_builder.go) - Added blockTerminates() to check if block ends with terminating statements - Added allBranchesTerminate() to check if all conditional branches terminate - Modified processIfStatement() to detect exhaustive termination and create unreachable blocks instead of connecting to merge - Modified processIfStatementElif() with same termination detection - Updated convertElifClauseToIf() to use parser-populated fields - Both functions now handle else_clause nodes by extracting their Body ### Test Updates (dead_code_test.go) - Updated UnreachableElif: expectedDead 0 → 1 - Added ExhaustiveIfElseReturn test case - Added NestedExhaustiveReturn test case - Added MixedTerminators test case (return/raise mix) - Updated ComplexControlFlow: expectedDead 3 → 4 (more accurate) ## Testing ✅ All 18 dead code detection tests passing ✅ All analyzer tests passing (16.8s) ✅ All parser tests passing (0.26s) ## Pattern Follows existing unreachable block creation pattern used for return/break/ continue statements (lines 323-324, 738-740, 759-761 in cfg_builder.go). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:29:37 +09:00
DaisukeYoda	352744f3dc	Remove incomplete quality analysis feature (#113 ) * Remove incomplete quality analysis feature The quality analysis feature was using fixed/mock values and incomplete implementations, making it misleading for users. This commit removes: - QualityMetricsResult and RefactoringTarget domain types - AnalyzeQuality field from SystemAnalysisRequest - AnalyzeQuality method from service interface and implementation - Maintainability and TechnicalDebt calculations (always returned fixed values) - Quality formatting in all output formats (text, JSON, CSV, HTML) - Quality configuration options - Deprecated unused helper functions that were kept for backward compatibility The removal focuses the codebase on its core functionality: - Dependency analysis with circular dependency detection - Architecture validation and layer analysis - Robert Martin's coupling metrics (Ca, Ce, I, A, D) All tests pass and the codebase is cleaner and more maintainable. Fixes #109 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix CI: Remove MaintainabilityIndex assertion from system_metrics_test The MaintainabilityIndex field was removed as part of the quality analysis cleanup, but a new test file was added to main after the PR was created. This commit fixes the test to remove the assertion on the deleted field. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: drop quality config toggle --------- Co-authored-by: DaisukeYoda <daisukeyoda@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-28 12:28:30 +09:00
DaisukeYoda	5e4dc6ad35	feat: add CFG support for Python comprehensions Implements Phase 1 of Issue #34 - CFG support for Python comprehensions. - Add processComprehension() method to model comprehensions as implicit loops - Support list/dict/set comprehensions and generator expressions - Handle filter conditions with proper conditional edges - Fix AST builder to correctly extract if clauses in comprehensions - Add comprehensive test coverage for all comprehension types The implementation models comprehensions with the following CFG structure: init -> loop_header -> [filter] -> body -> append -> loop_header -> exit This significantly improves static analysis accuracy for Python code that uses comprehensions, which are a common Python idiom. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-20 16:30:41 +09:00
DaisukeYoda	bbd48ab03c	fix: address linter issues - Remove unnecessary nil check for slice (S1009) - Replace conditional TrimSuffix with unconditional call (S1017) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-19 21:10:19 +09:00
DaisukeYoda	555add216f	fix(analyzer): restore plain import resolution	2025-09-19 20:01:48 +09:00
DaisukeYoda	235d59f948	fix: resolve stub state of system analysis and improve architecture validation - Implement actual system analysis functionality replacing stub returns: * Add extractModuleMetrics for real dependency metrics * Add generateCycleBreakingSuggestions for circular dependency resolution * Add classifyModulesByQuality for module quality classification * Add generateRefactoringTargets with priority calculation * Add identifyHotSpots for complex coupled modules - Enable quality analysis in analyze command (AnalyzeQuality: true) - Fix dependency resolution for project modules: * Add resolveAbsoluteImportWithProject to prioritize project modules * Search current directory, project root, and parent directory - Add configuration loading to deps command: * Load architecture rules from config files * Merge CLI flags with configuration - Improve architecture layer detection accuracy: * Prioritize last part of module path (e.g., "router" → presentation) * Remove business-specific names from layer patterns * Add project prefix support in pattern matching This resolves the issue where system analysis was returning mostly empty values for dependencies, quality metrics, and recommendations. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-19 19:23:28 +09:00
DaisukeYoda	ab522ca407	feat(clone): improve clone detection grouping and display (#97 ) * feat(clone): improve clone detection grouping and display Major improvements to clone detection quality and performance: Display improvements: - Changed "Clone Pairs" to "Unique Fragments" for accurate representation - Added TotalClones field to AnalyzeSummary struct - Fixed misleading 10000 pairs display (was showing MaxClonePairs limit) Grouping algorithm improvements: - Implemented new Centroid-based grouping algorithm to avoid transitive similarity issues - Changed default grouping from Connected Components to K-Core for better balance - K-Core provides better quality groups while maintaining good performance Threshold adjustments for better clone quality: - Type3 threshold: 0.70 → 0.80 - Type4 threshold: 0.60 → 0.75 Results: - Before: 417 clones with 11% average similarity (poor quality groups) - After: 26 clones with 81% average similarity (high quality groups) Available grouping modes: - k_core (default): Good balance of quality and performance - centroid: High quality but slower (O(n²), best for small projects) - star: Medium quality and speed - complete_linkage: High precision but slow - connected: Fastest but lowest quality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: address code review feedback on clone detection - Add MaxClonePairs limit check in generatePairsFromGroups function - Add explanatory comments for dual Centroid implementations - Clarify that clone_detector.go has optimized direct implementation - Ensure centroid_grouping.go maintains GroupingStrategy compatibility * refactor: use unified GroupingStrategy interface for all grouping modes - Remove special handling for Centroid mode in DetectClonesWithContext - Delete redundant detectClonesWithCentroid method - All grouping strategies now use unified CreateGroupingStrategy interface - Optimize CentroidGrouping with similarity index for better performance - Maintain architectural consistency with strategy pattern * test: update tests for new clone type thresholds - Update Type3 threshold tests from 0.70 to 0.80 - Update Type4 threshold tests from 0.60 to 0.75 - Adjust test expectations for ClassifyCloneType: - 0.75 similarity now maps to Type4 (was Type3) - 0.65 similarity is no longer a valid clone (was Type4) - Update IsSignificantClone test for 0.60 threshold (now below minimum) * fix: resolve lint and test failures in PR #97 - Remove unused methods determineGroupCloneType and generatePairsFromGroups from clone_detector.go - Update test expectations to match new clone thresholds (Type-3: 0.80, Type-4: 0.75) - Fix test cases in clone_detector_test.go for threshold value tests These changes ensure all CI checks pass for the improved clone detection grouping feature. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve memory leak and performance issues in centroid grouping - Replace pointer-based map keys with string keys to fix memory leak - Add group size limit (50) for performance optimization on large codebases - Use makePairKey function with fragmentID for consistent string-based keys - Prevent potential GC issues with CodeFragment keys in similarity index 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> fix: resolve configuration consistency issues in clone type thresholds This commit addresses the critical configuration management inconsistencies identified in PR97 where clone type thresholds had different values across multiple locations, causing unpredictable behavior. Changes: - Fix init.go template values to match constants (0.80→0.80, 0.75→0.75) - Remove hardcoded thresholds from centroid_grouping.go classification logic - Add dynamic threshold configuration to GroupingConfig and CentroidGrouping - Ensure clone detector passes thresholds to all grouping strategies - Add comprehensive tests for threshold consistency and configuration flow Impact: - User-configured thresholds now properly affect clone classification - Consistent behavior across CLI flags, config files, and defaults - Eliminates Type3/Type4 threshold conflicts (0.80 vs 0.75, 0.75 vs 0.65) - Predictable clone detection results matching user expectations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * style: fix indentation and formatting issues in analyze command and related files This commit applies consistent code formatting across the analyze command infrastructure, addressing indentation inconsistencies introduced during recent refactoring work. Changes: - Fix mixed spaces/tabs indentation in cmd/pyscn/analyze.go - Standardize struct field alignment in domain/analyze.go - Correct formatting in service layer files for analyze functionality - Apply consistent code style throughout system analysis components Technical details: - Convert spaces to tabs for Go struct field alignment - Maintain consistent indentation patterns across all modified files - No functional changes, purely cosmetic formatting improvements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: add missing fragmentID and makePairKey methods to CentroidGrouping This commit resolves undefined function references in centroid_grouping.go by adding the missing helper methods as instance methods of CentroidGrouping. Changes: - Add fragmentID() method to generate stable fragment identifiers - Add makePairKey() method for creating fragment pair keys - Convert function calls to method calls (c.makePairKey, c.fragmentID) - Add fmt import for string formatting Technical details: - fragmentID creates location-based IDs: "filepath\|startLine\|endLine\|startCol\|endCol" - makePairKey ensures deterministic ordering for fragment pairs - Methods are now encapsulated within CentroidGrouping for better design - Removes dependency on external fragmentID function from star_medoid_grouping.go Impact: - Fixes potential build errors when centroid grouping is used independently - Improves code organization and reduces cross-file dependencies - Maintains same functionality with better encapsulation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: DaisukeYoda <daisukeyoda@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-15 21:15:09 +09:00
DaisukeYoda	1b92f87ed6	feat(deps): Auto-detect architecture patterns for validation (#96 ) * feat(deps): add auto-detect architecture validation - Add autoDetectArchitecture() to automatically identify common layer patterns - Detect standard layers: presentation, application, domain, infrastructure - Apply default layered architecture rules when no config exists - Add --auto-detect flag (default: true) to deps command - Fix module pattern matching to include submodules - Fix context cancellation handling in architecture analysis - Improve config merging to preserve rules while applying CLI overrides This allows architecture validation without configuration files, making it easier to adopt and reducing configuration complexity. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(deps): make architecture validation default enabled - Change --validate flag default from false to true - Update help text to reflect validation is now default - Users can disable with --no-validate if needed - Improves code quality by default without user intervention 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(deps): disable auto-detection by default to avoid false positives - Set --validate flag default to false (opt-in) - Set --auto-detect flag default to false (opt-in) - Fix layer patterns: remove "app" from application layer - Make auto-detected rules more permissive for real-world projects - Update help text to reflect opt-in behavior Auto-detection was causing too many false positives in real projects. Users should explicitly enable validation when their project structure matches standard patterns or when they have custom rules defined. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: remove --validate flag and make architecture validation always enabled - Remove depsValidate flag variable - Set AnalyzeArchitecture to always true - Update help text to reflect validation is always performed - Remove validate flag from CLI options This ensures architecture validation cannot be disabled, preventing the mistake of shipping unusable features. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(formatter): improve Architecture tab HTML layout to match Dependencies tab - Refactor Architecture HTML to use consistent GenerateMetricCard structure - Add proper status badges for violations and compliance score - Implement table-based layout for violations and layer dependencies - Add color coding for compliance score (green/yellow/red) - Show layer coupling and cohesion metrics in organized tables - Limit violations display to 20 with overflow indicator - Use consistent section headers and metric grid layout 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: add white background to tab-content for consistent appearance - Add background: white to .tab-content CSS class - Ensures both Dependencies and Architecture tabs have white backgrounds - Fixes issue where Architecture tab had transparent background showing gradient * fix: remove extra closing div tags in HTML content generation - Remove extra </div> from writeHTMLDependenciesContent - Remove extra </div> from writeHTMLArchitectureContent - Fixes tab content structure to match properly within tabs container * fix: exclude __init__.py to submodule dependencies and allow same-layer deps - Skip dependencies from __init__.py to its own submodules - This is a common Python pattern for re-exporting public APIs - Allow same-layer dependencies in auto-detected architecture rules - Reduces false positives from 344 to 15 violations (97.4% compliance) * fix: improve layer detection by prioritizing specific patterns - findLayerForModule now selects the most specific matching pattern - Specificity determined by number of dots in pattern (app.services > app) - Fixes misclassification of services modules as infrastructure - Violations reduced from 344 to 13 (96% reduction) - Compliance score: 97.7% * fix: remove unnecessary --auto-detect flag and improve pattern detection - Remove --auto-detect flag as auto-detection happens by default when no config exists - Add 'app' and 'application' to application layer patterns for better detection - Update help text to reflect automatic behavior The deps command now automatically detects architecture patterns when no rules are defined in config files, making the --auto-detect flag redundant. * fix: remove incorrectly added 'app' and 'application' patterns Remove the patterns that were causing all modules starting with 'app' to be misclassified as application layer, which broke the layer detection. * fix(arch,ui): improve layer matching specificity and HTML cards - Use original pattern specificity with tie-breaker for layer assignment - Include layer names in violation summary table - Show Violations as large metric number (not small badge) - Unify metric-grid class; harden tab switching script * feat(analyze-summary): add system Dependencies and Architecture metrics to Summary tab and clarify CBO labels - Summary now shows Total Modules, Total Dependencies, Max Depth, Cycles - Summary shows Architecture Violations, Compliance, Layers, Total Rules - Rename CBO labels to avoid confusion with module dependencies - Update tabs to use concise names (no parentheses) * feat(score): include module dependencies and architecture in Health Score\n\n- Add deps/arch metrics to AnalyzeSummary\n- Revise scoring to 5 domains (Complexity, Dead Code, Clones, CBO, Deps/Arch)\n- Pull System analysis metrics into summary during report generation\n * fix(analyze): correct CBO goroutine block and separate system analysis goroutine - Close missing braces and return paths in CBO analysis block - Start system (deps+arch) goroutine independently after CBO block - Prevent potential build failures from malformed block structure --------- Co-authored-by: DaisukeYoda <daisukeyoda@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-14 18:52:34 +09:00
DaisukeYoda	2da26eda28	chore: format code with go fmt Run go fmt on all Go files to ensure consistent formatting 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-13 15:51:51 +09:00
DaisukeYoda	178b557ed3	fix: resolve all linter issues - Fix error check for CSV writer.Write calls - Replace if-HasSuffix with strings.TrimSuffix - Remove unused append result for highComplexityModules - Remove all unused functions across codebase - Remove unused import in config loader - Fix DOT format type consistency 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-13 15:29:59 +09:00
DaisukeYoda	f38e00bf38	feat: implement system-level structural quality analysis (Phase 2) Add comprehensive dependency analysis with circular dependency detection, coupling metrics, and unified HTML reporting across all commands. - Add deps command for dependency analysis - Implement Tarjan's algorithm for circular dependency detection - Calculate Robert Martin's coupling metrics (Ca, Ce, I, A, D) - Create unified HTML template for consistent reporting - Fix import extraction with regex-based fallback - Simplify command interfaces following KISS principle - Standardize output file naming and location 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-13 14:33:25 +09:00
DaisukeYoda	db15d3cd99	output: change default report directory to .pyscn/reports under CWD; create if missing; update docs and config comments	2025-09-12 13:32:32 +09:00
DaisukeYoda	ef940c7669	refactor: remove external service references from comments Remove all "like ruff" references from code comments to avoid external dependencies in documentation. The TOML-only configuration strategy stands on its own merits without needing to reference other tools. Files updated: - service/clone_config_loader.go - service/config_loader.go - service/dead_code_config_loader.go - cmd/pyscn/clone.go - internal/config/toml_loader.go 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-12 00:22:24 +09:00
DaisukeYoda	3ceb110b73	fix: critical TOML configuration bugs and unify loading strategy This commit addresses critical bugs identified in PR review and unifies configuration loading across all commands: Critical Bug Fixes: - Fix boolean merging bug where config values always override defaults even when unset (use pointer types to detect unset vs false) - Fix YAML syntax being written to TOML files in tests - Fix missing CLI flag processing causing validation bypass - Update help examples from deprecated --format to new --json syntax - Fix test expectations for TOML section syntax Configuration Unification: - Unify all commands to use TOML-only configuration loading - Eliminate double-loading conflicts between clone/deadcode/complexity - Remove remaining YAML/JSON config file discovery - Ensure consistent pyproject.toml > .pyscn.toml > defaults priority All previously failing tests now pass with proper TOML configuration handling and unified loading strategy across the entire codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 23:54:30 +09:00
DaisukeYoda	73deb3bd3e	feat: unify configuration to TOML-only like ruff This PR follows ruff's approach to configuration management by: 1. Prioritizing pyproject.toml over dedicated config files 2. Removing YAML/JSON configuration support completely 3. Hiding complex algorithm flags from CLI help 4. Providing simple presets for common use cases ## Key Changes ### New Files - `internal/config/pyproject_loader.go` - pyproject.toml integration - `internal/config/toml_loader.go` - TOML-only configuration loader - `.pyscn.toml` - example dedicated config file ### Configuration Priority (like ruff) 1. CLI arguments (highest priority) 2. pyproject.toml [tool.pyscn] section 3. .pyscn.toml dedicated config file 4. Built-in defaults (lowest priority) ### Simplified CLI - Hidden complex algorithm flags (accessible but not in help) - Simple presets: --fast, --precise, --balanced - Focus on essential user options only ### TOML Examples ```toml # pyproject.toml [tool.pyscn.clone] [tool.pyscn.clone.analysis] min_lines = 5 # .pyscn.toml [analysis] min_lines = 5 ``` ## Benefits - Consistent with modern Python tooling ecosystem - Simplified user experience for new users - Backward compatibility for existing scripts - Configuration consolidation in pyproject.toml 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 23:21:35 +09:00
DaisukeYoda	0832fd8bd7	chore(lint): remove unused helpers and fix rune-to-string conversions in benches	2025-09-11 15:32:23 +09:00
DaisukeYoda	ab591393a0	feat: LSH-based clone detection acceleration (Issue #82 )\n\n- Add AST feature extractor (subtree hashes, k-grams, patterns)\n- Implement MinHash + LSH banding index\n- Integrate LSH two-stage pipeline in CloneDetector\n- CLI flags: --use-lsh, --lsh-threshold, --lsh-bands, --lsh-rows, --lsh-hashes\n- Service wiring + domain config\n- Unit, integration tests and benches\n\nOpt-in and backward compatible.	2025-09-11 13:42:46 +09:00
DaisukeYoda	bf3b0e100f	perf(complete-linkage): add early exit when any inter-cluster pair < threshold (Issue #81 )	2025-09-11 11:20:55 +09:00
DaisukeYoda	33be496f16	feat(analyzer): add grouping modes (connected, star, complete_linkage, k_core) and CLI flags; integrate with detector and service (Issue #81 )	2025-09-11 09:17:27 +09:00
DaisukeYoda	af65e1fd13	chore(lint): fix lints in new analyzer files (formatting, nolint for hook)	2025-09-11 01:49:09 +09:00
DaisukeYoda	15cb8b2b95	feat(analyzer): implement Star/Medoid grouping strategy and tests (Issue #80 ) - Add GroupingStrategy interface - Implement StarMedoidGrouping with similarity cache, convergence, deterministic tie-breakers - Add tests: simple, multiple groups, none, large-scale cases - Add strategy hook into CloneDetector for backward compatibility Perf: O(n×k) with cached similarities, early convergence Notes: excludes singleton groups; medoid ties resolved by location order	2025-09-11 01:37:41 +09:00
DaisukeYoda	6b50848f75	refactor: rename project from pyqol/pyscan to pyscn - Update all references from pyqol and pyscan to pyscn - Rename directories: cmd/pyqol → cmd/pyscn, python/src/pyscan → python/src/pyscn - Update Go module path to github.com/ludo-technologies/pyscn - Update PyPI package name to pyscn - Update all documentation and configuration files - Successfully published to PyPI as pyscn This change provides a shorter, more memorable package name (5 characters) that is easier to type in commands like 'pyscn analyze' while maintaining the core meaning of Python code scanning/analysis.	2025-09-08 01:31:33 +09:00
DaisukeYoda	ed5a277d00	refactor: remove unused IncludeThirdParty field from CBOOptions - Remove IncludeThirdParty field from CBOOptions struct in analyzer - Remove IncludeThirdParty from CBOAnalysisOptions in domain - Update DefaultCBOOptions to remove unused field initialization - Clean up buildCBOOptions method in service layer - Update tests to remove references to deleted field This field was defined but never used anywhere in the codebase, causing code bloat and potential confusion. The field can be re-implemented later if third-party dependency filtering is needed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-06 01:40:25 +09:00
DaisukeYoda	3c387eee48	fix: resolve CBO function call false positives and nested call detection - Replace string parsing with structural AST analysis for Call nodes - Add extractClassNameFromCallNode and extractClassNameFromAttribute methods - Fix function calls being incorrectly counted as class dependencies - Enhance walkNode to traverse Args and Value fields for nested calls - Only count actual class instantiations (local, imported, builtins) as dependencies - Ensure A(B()) detects both A and B dependencies correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-06 01:25:58 +09:00
Daisuke Yoda	92bedbe3dc	fix: separate built-in types from functions in CBO analysis - Add builtinFunctions map to CBOAnalyzer struct - Separate built-in types (list, int, dict) from built-in functions (print, len, str) - Always exclude built-in functions from dependencies regardless of IncludeBuiltins setting - Built-in types only included when IncludeBuiltins=true - Fix test failures: Logger class now CBO=0, builtin test now CBO=3 - Add isBuiltinFunction() helper method for cleaner dependency filtering 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-06 00:23:04 +09:00
Daisuke Yoda	3521522955	fix: resolve CBO analyzer test failures from code review Fixed critical issues identified in code review: - Type annotation processing: Added support for tree-sitter specific nodes (generic_type, type_parameter) and proper handling of subscript types like List[User] - Instantiation detection: Fixed NodeCall processing to handle assignment-based instantiation (self.logger = Logger()) by parsing Call(Name(...)) format - Wildcard pattern matching: Implemented regex-based matching for patterns like Test using regexp.MatchString - Removed unused ImportDependencies field from CBOResult and CBOMetrics structs and all related code - Changed zero classes error policy to return valid empty response with warnings instead of error - Fixed builtin types handling for both type annotations and instantiation contexts All CBO analyzer tests now pass (24/24). End-to-end testing confirms proper detection of: - Class inheritance dependencies - Type annotation dependencies (including generics) - Object instantiation dependencies - Builtin type usage - Pattern-based filtering - Risk level assessment 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-05 22:15:19 +09:00
Daisuke Yoda	b6205e5429	feat: implement CBO (Coupling Between Objects) metric analysis Implements comprehensive CBO analysis functionality to measure class coupling in Python codebases. This addresses issue #43 by providing developers with quantitative insights into object-oriented design quality. Features: - Complete CBO metric calculation using AST analysis - Dependency detection for inheritance, type hints, instantiation, and attribute access - Risk level assessment (low/medium/high) with configurable thresholds - Multiple output formats: text, JSON, YAML, CSV, HTML - Flexible filtering and sorting options - Command-line interface consistent with existing commands - Comprehensive test coverage Architecture: - domain/cbo.go: Core business models and interfaces - internal/analyzer/cbo.go: CBO calculation algorithms - service/cbo_service.go: Business logic implementation - service/cbo_formatter.go: Multi-format output generation - app/cbo_usecase.go: Application workflow orchestration - cmd/pyqol/cbo.go: CLI command integration Usage: pyqol cbo src/ # Analyze all Python files pyqol cbo --json src/ # JSON output pyqol cbo --html src/ # HTML report pyqol cbo --min-cbo 5 src/ # Filter high coupling classes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-05 19:46:07 +09:00
Daisuke Yoda	312e6fafa6	fix: resolve viper race condition in parallel config loading Replace global viper instance with new instances in loadConfigFromFile and SaveConfig functions to prevent race conditions during concurrent execution of analyze command. Fixes #62 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-31 23:36:22 +09:00
Daisuke Yoda	3148305b1f	fix: resolve three critical logic bugs - Add error handling to resolveOutputDirectory function - Fix Windows path traversal with volume-aware termination - Fix test race conditions by setting cmd.Dir in E2E tests 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-30 23:53:13 +09:00
Daisuke Yoda	08666a2aff	refactor: implement Separation of Concerns (SoC) for better code organization ## Overview Refactored functions to follow Single Responsibility Principle (SRP) by separating mixed concerns into focused, single-purpose functions, improving maintainability and testability. ## Key Changes ### 1. utils.go - Filename Generation Separation Before: `generateFileNameWithTarget()` had 3 mixed responsibilities - Timestamp filename generation - Configuration loading - Output directory resolution After: Split into 3 focused functions - `generateTimestampedFileName()` - filename generation only - `resolveOutputDirectory()` - directory resolution only - `generateOutputFilePath()` - workflow orchestration (delegates to above) ### 2. config.go - Configuration Loading Separation Before: `LoadConfigWithTarget()` had 3 mixed responsibilities - Configuration file discovery - File reading and parsing - Configuration validation After: Split into 3 focused functions - `discoverConfigFile()` - file discovery only - `loadConfigFromFile()` - file loading and parsing only - `LoadConfigWithTarget()` - workflow orchestration (delegates to above) ## SRP Benefits Achieved - Single Responsibility: Each function has one clear purpose - Testability: Individual concerns can be unit tested separately - Reusability: Components can be used independently in other contexts - Maintainability: Changes affect only relevant functions - Readability: Function names clearly indicate their purpose ## Impact - Functions now follow SRP and SoC principles - No functional changes - pure refactoring - All existing interfaces preserved - Enhanced code organization and maintainability ## Validation - All tests pass (no regression) - No linting issues - Clean separation of concerns achieved 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-30 23:46:27 +09:00
Daisuke Yoda	1a47c66ed0	refactor: eliminate DRY violations in configuration and testing code ## Overview Comprehensive refactoring to eliminate code duplication across command files, E2E tests, and configuration search logic, improving maintainability and following DRY (Don't Repeat Yourself) principles. ## Key Improvements ### 1. Command Files Deduplication - Eliminated: 4 duplicate `targetPath` extraction blocks - Added: `getTargetPathFromArgs()` helper function in utils.go - Updated: analyze.go, clone.go, complexity_clean.go, dead_code.go - Benefit: Single point of change for path extraction logic ### 2. E2E Test Helper Creation - Eliminated: 5 duplicate config file creation blocks - Added: `createTestConfigFile()` helper in e2e/helpers.go - Updated: All E2E test files (clone, complexity, dead_code) - Benefit: Consistent test configuration management ### 3. Configuration Search Logic Optimization - Eliminated: 5 duplicate directory search loops in findDefaultConfig() - Added: `searchConfigInDirectory()` helper function - Refactored: config.go search logic now uses single reusable function - Benefit: Cleaner, more maintainable configuration discovery ## Code Quality Metrics - Duplicate code blocks removed: 14 → 0 - Lines of duplicate code eliminated: ~80 lines - Helper functions added: 3 - Files affected: 10 files updated, 1 file added ## Impact - Maintainability: Changes now require updates in 1 location vs 14 locations - Readability: Cleaner, more focused functions with clear responsibilities - Testability: Centralized helpers make testing easier and more consistent - Future development: Adding new config options or test patterns is now trivial ## Validation - All tests pass (138 tests) - No linting issues (golangci-lint clean) - No functional changes - pure refactoring - Configuration discovery behavior unchanged 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-30 22:17:48 +09:00
Daisuke Yoda	0bc3128779	feat: implement Ruff-style hierarchical configuration system ## Overview Implements a comprehensive configuration discovery system inspired by Ruff, eliminating environment variable dependencies and providing seamless project integration through YAML configuration files. ## Key Features - Hierarchical Discovery: Target directory → upward traversal → XDG config → home directory - Multiple Formats: Support for .pyqol.yaml, .pyqol.yml, pyqol.yaml, pyqol.yml, and JSON variants - Output Directory Control: Configure report file destinations via output.directory - Test Environment: All tests now use temporary directories, no project file pollution ## Configuration Discovery Order 1. Target directory & parent directories (search upward to filesystem root) 2. XDG config directory ($XDG_CONFIG_HOME/pyqol/ or ~/.config/pyqol/) 3. Home directory (~/.pyqol.yaml for backward compatibility) ## Implementation Details - Added LoadConfigWithTarget() for target-aware config discovery - Updated all commands to use generateFileNameWithTarget() - Modified E2E tests to use temporary config files - Fixed unit tests that were generating files in project directories - Enhanced utils.go with configuration-aware filename generation ## Breaking Changes - Environment variable PYQOL_OUTPUT_DIR is no longer supported - All file output now controlled via .pyqol.yaml configuration ## Documentation Updates - Updated README.md with configuration examples and usage - Enhanced DEVELOPMENT.md with configuration system documentation - Updated ARCHITECTURE.md to reflect new configuration discovery - Added practical examples for configuration-based workflows ## Testing - All tests pass without generating files in project directories - E2E tests use temporary config files for output control - Unit tests updated to avoid project directory pollution - Configuration discovery thoroughly tested 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-30 21:23:35 +09:00
DaisukeYoda	c8f357e636	Merge pull request #59 from pyqol/feature/format-flags-and-file-output feat: improve output format flags and file generation	2025-08-28 23:23:48 +09:00
Daisuke Yoda	97a8e375dd	feat: improve output format flags and file generation - Replace --format flag with individual format flags (--html, --json, --csv, --yaml) - Add automatic file generation for non-text formats with timestamps - Add HTML report auto-opening with --no-open flag to disable - Add unified analyze command with comprehensive reporting - Add browser opening utility for cross-platform support - Update all commands (clone, complexity, deadcode) to use new format system - Add OutputPath and NoOpen fields to domain models - Update configuration loaders to handle new flag system - Add HTML format support to all formatters - Update tests to match new flag expectations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-28 22:19:53 +09:00
Daisuke Yoda	fca773d5b2	fix: fix APTED algorithm missing td matrix update in else branch - Fix missing td matrix update in computeForestDistance else branch - This was causing trees with same labels but different structures to return distance 0.0 (100% similarity) - Update test expectation for more accurate similarity calculation - Average similarity in real projects now shows realistic 0.84 instead of 1.00 - Clone detection now reports accurate clone pair counts instead of fallback values 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-28 20:32:43 +09:00
DaisukeYoda	4214dea453	fix: resolve golangci-lint ineffectual assignment warning	2025-08-24 16:00:04 +09:00
DaisukeYoda	07c3fea737	fix: address concurrency concerns with thread-safe FlagTracker - Implement thread-safe FlagTracker with sync.RWMutex for concurrent access - Replace raw map[string]bool with FlagTracker throughout service layer - Add comprehensive tests including concurrent access testing - Create CloneConfigurationLoaderWithFlags for consistency - Ensure all flag tracking operations are thread-safe This addresses the potential race conditions identified in PR review.	2025-08-24 15:34:56 +09:00

1 2 3

101 Commits