Update ARCHITECTURE.md to clearly indicate that caching mechanisms (AST, CFG, LSH caching) are planned for future releases and not yet implemented in v1.0.0. This ensures documentation accurately reflects the current implementation state.
29 KiB
Architecture Overview
System Design
pyscn follows Clean Architecture principles with clear separation of concerns and dependency inversion. The system is designed as a modular, high-performance static analysis tool for Python code.
graph TB
subgraph "CLI Layer"
A[CLI Commands] --> B[ComplexityCommand]
end
subgraph "Application Layer"
B --> C[ComplexityUseCase]
end
subgraph "Domain Layer"
C --> D[ComplexityService Interface]
C --> E[FileReader Interface]
C --> F[OutputFormatter Interface]
end
subgraph "Service Layer"
G[ComplexityService] -.-> D
H[FileReader] -.-> E
I[OutputFormatter] -.-> F
J[ConfigurationLoader]
K[ProgressReporter]
end
subgraph "Infrastructure Layer"
G --> L[Tree-sitter Parser]
G --> M[CFG Builder]
G --> N[Complexity Calculator]
H --> O[File System]
I --> P[JSON/YAML/CSV/HTML Formatters]
end
L --> Q[Python Source Code]
M --> R[Control Flow Graphs]
N --> S[Complexity Metrics]
Clean Architecture Layers
1. Domain Layer (domain/)
The innermost layer containing business rules and entities. No dependencies on external frameworks.
// domain/complexity.go
type ComplexityService interface {
Analyze(ctx context.Context, req ComplexityRequest) (ComplexityResponse, error)
AnalyzeFile(ctx context.Context, filePath string, req ComplexityRequest) (ComplexityResponse, error)
}
type FileReader interface {
CollectPythonFiles(paths []string, recursive bool, include, exclude []string) ([]string, error)
IsValidPythonFile(path string) bool
}
type OutputFormatter interface {
Write(response ComplexityResponse, format OutputFormat, writer io.Writer) error
}
type ComplexityRequest struct {
Paths []string
OutputFormat OutputFormat
OutputWriter io.Writer
MinComplexity int
MaxComplexity int
SortBy SortCriteria
LowThreshold int
MediumThreshold int
ShowDetails bool
Recursive bool
IncludePatterns []string
ExcludePatterns []string
ConfigPath string
}
2. Application Layer (app/)
Orchestrates business logic and coordinates between domain services.
// app/complexity_usecase.go
type ComplexityUseCase struct {
service domain.ComplexityService
fileReader domain.FileReader
formatter domain.OutputFormatter
configLoader domain.ConfigurationLoader
progress domain.ProgressReporter
}
func (uc *ComplexityUseCase) Execute(ctx context.Context, req domain.ComplexityRequest) error {
// 1. Validate input
// 2. Load configuration
// 3. Collect Python files
// 4. Perform analysis
// 5. Format and output results
}
3. Service Layer (service/)
Implements domain interfaces with concrete business logic.
// service/complexity_service.go
type ComplexityService struct {
progress domain.ProgressReporter
}
func (s *ComplexityService) Analyze(ctx context.Context, req domain.ComplexityRequest) (domain.ComplexityResponse, error) {
// Implements the complexity analysis workflow
}
4. CLI Layer (cmd/pyscn/)
Thin adapter layer that handles user input and delegates to application layer.
// cmd/pyscn/complexity_clean.go
type ComplexityCommand struct {
outputFormat string
minComplexity int
maxComplexity int
// ... other CLI flags
}
func (c *ComplexityCommand) runComplexityAnalysis(cmd *cobra.Command, args []string) error {
// 1. Parse CLI flags into domain request
// 2. Create use case with dependencies
// 3. Execute use case
// 4. Handle errors appropriately
}
Core Components
1. Parser Module (internal/parser)
The parser module handles Python code parsing using tree-sitter.
// internal/parser/parser.go
type Parser struct {
language *sitter.Language
parser *sitter.Parser
}
type Node struct {
Type NodeType
Value string
Children []*Node
Location Location
}
type Location struct {
File string
Line int
Col int
}
Responsibilities:
- Parse Python source files
- Build internal AST representation
- Handle syntax errors gracefully
- Support Python 3.8+ syntax
Key Files:
parser.go: Main parser implementationpython.go: Python-specific parsing logicast.go: AST node definitionsvisitor.go: AST visitor pattern implementation
2. Analyzer Module (internal/analyzer)
The analyzer module contains the core analysis algorithms.
2.1 Control Flow Graph (CFG)
// internal/analyzer/cfg.go
type CFG struct {
Entry *BasicBlock
Exit *BasicBlock
Blocks map[string]*BasicBlock
}
type BasicBlock struct {
ID string
Statements []ast.Node
Successors []*BasicBlock
Predecessors []*BasicBlock
}
type CFGBuilder struct {
current *BasicBlock
cfg *CFG
loops []LoopContext
breaks []BreakContext
}
Algorithm:
- Create entry and exit blocks
- Process statements sequentially
- Handle control flow statements:
if/elif/else: Create branchesfor/while: Create loop structuresbreak/continue: Update loop edgesreturn: Connect to exit blocktry/except: Handle exception flow
2.2 Dead Code Detection
// internal/analyzer/dead.go
type DeadCodeDetector struct {
cfg *CFG
reached map[string]bool
liveVars map[string]VarInfo
}
type Finding struct {
Type FindingType
Location Location
Message string
Severity Severity
}
Algorithm:
- Mark entry block as reachable
- Perform breadth-first traversal
- Mark all visited blocks as reachable
- Report unreachable blocks as dead code
- Analyze variable usage for unused detection
2.3 APTED Clone Detection with LSH Acceleration
// internal/analyzer/apted.go
type APTEDAnalyzer struct {
threshold float64
costModel CostModel
lsh *LSHIndex // LSH acceleration for large projects
}
type TreeNode struct {
Label string
Children []*TreeNode
Parent *TreeNode
ID int
Features []uint64 // Hash features for LSH
}
type CostModel interface {
Insert(node *TreeNode) float64
Delete(node *TreeNode) float64
Rename(node1, node2 *TreeNode) float64
}
// LSH (Locality-Sensitive Hashing) for acceleration
type LSHIndex struct {
bands int
rows int
hashes int
buckets map[string][]*CodeFragment
extractor *FeatureExtractor
}
type FeatureExtractor struct {
// Extract features for LSH hashing
SubtreeHashes bool
KGrams int
Patterns []string
}
Two-Stage Detection Process:
Stage 1: LSH Candidate Generation (for large projects)
- Extract AST features (subtree hashes, k-grams, patterns)
- Apply MinHash + LSH banding to find candidate pairs
- Filter candidates by similarity threshold
- Early termination for dissimilar pairs
Stage 2: APTED Verification
- Convert candidate pairs to ordered trees
- Compute precise tree edit distance using APTED
- Use dynamic programming with path decomposition
- Compare distance against threshold
- Apply advanced grouping algorithms
Clone Grouping Algorithms:
type GroupingMode string
const (
GroupingModeConnected GroupingMode = "connected" // Connected components
GroupingModeStar GroupingMode = "star" // Star/medoid clustering
GroupingModeCompleteLinkage GroupingMode = "complete_linkage" // Complete linkage clustering
GroupingModeKCore GroupingMode = "k_core" // K-core decomposition
)
type CloneGroup struct {
ID string
Clones []*Clone
Centroid *Clone // Representative clone
Similarity float64 // Intra-group similarity
Algorithm GroupingMode // Grouping algorithm used
}
- Connected Components: Groups clones based on similarity edges
- Star/Medoid: Finds representative (medoid) and groups around it
- Complete Linkage: Hierarchical clustering with maximum distance constraint
- K-Core: Identifies densely connected clone groups
3. Configuration Module (internal/config)
The configuration system implements TOML-only configuration discovery similar to Ruff, with support for both dedicated .pyscn.toml files and pyproject.toml integration.
// internal/config/config.go
type Config struct {
// Analysis settings
DeadCode DeadCodeConfig `toml:"dead_code"`
Clones CloneConfig `toml:"clones"`
Complexity ComplexityConfig `toml:"complexity"`
CBO CBOConfig `toml:"cbo"`
// Output settings
Output OutputConfig `toml:"output"`
// File patterns
Analysis AnalysisConfig `toml:"analysis"`
}
type OutputConfig struct {
Format string `toml:"format"`
Directory string `toml:"directory"` // Output directory for reports
ShowDetails bool `toml:"show_details"`
SortBy string `toml:"sort_by"`
MinComplexity int `toml:"min_complexity"`
}
type CloneConfig struct {
// Analysis parameters
MinLines int `toml:"min_lines"`
MinNodes int `toml:"min_nodes"`
SimilarityThreshold float64 `toml:"similarity_threshold"`
// LSH acceleration
LSH LSHConfig `toml:"lsh"`
// Grouping algorithms
Grouping GroupingConfig `toml:"grouping"`
}
type LSHConfig struct {
Enabled string `toml:"enabled"` // "true", "false", "auto"
AutoThreshold int `toml:"auto_threshold"` // Auto-enable for projects >N files
SimilarityThreshold float64 `toml:"similarity_threshold"`
Bands int `toml:"bands"`
Rows int `toml:"rows"`
Hashes int `toml:"hashes"`
}
type GroupingConfig struct {
Mode string `toml:"mode"` // "connected", "star", "complete_linkage", "k_core"
Threshold float64 `toml:"threshold"`
KCoreK int `toml:"k_core_k"`
}
Configuration Discovery Algorithm
pyscn uses a TOML-only hierarchical configuration discovery system:
// LoadConfigWithTarget searches for configuration in this order:
func LoadConfigWithTarget(configPath string, targetPath string) (*Config, error) {
// 1. Explicit config path (highest priority)
if configPath != "" {
return loadFromFile(configPath)
}
// 2. Search from target directory upward
if targetPath != "" {
if config := searchUpward(targetPath); config != "" {
return loadFromFile(config)
}
}
// 3. Current directory
if config := findInDirectory("."); config != "" {
return loadFromFile(config)
}
// 4. Default configuration
return DefaultConfig(), nil
}
Configuration File Priority:
.pyscn.toml(dedicated config file)pyproject.toml(with[tool.pyscn]section)
Search Strategy:
- Target Directory & Parents: Starting from the analysis target, search upward to filesystem root
- TOML-only: Simplified configuration strategy focusing on modern TOML format
4. CLI Module (cmd/pyscn)
The CLI layer uses the Command pattern with Cobra framework.
// cmd/pyscn/main.go - Root command setup
type CLI struct {
rootCmd *cobra.Command
}
// cmd/pyscn/complexity_clean.go - Command implementation
type ComplexityCommand struct {
outputFormat string
minComplexity int
maxComplexity int
sortBy string
showDetails bool
configFile string
lowThreshold int
mediumThreshold int
verbose bool
}
// Available Commands:
// - complexity: Calculate McCabe cyclomatic complexity
// - deadcode: Find unreachable code using CFG analysis
// - clone: Detect code clones using APTED with LSH acceleration
// - cbo: Analyze Coupling Between Objects metrics
// - analyze: Run comprehensive analysis with unified reporting
// - check: Quick CI-friendly quality check
// - init: Generate configuration file
Dependency Injection & Builder Pattern
The system uses dependency injection to achieve loose coupling and testability.
// app/complexity_usecase.go - Builder pattern for complex object creation
type ComplexityUseCaseBuilder struct {
service domain.ComplexityService
fileReader domain.FileReader
formatter domain.OutputFormatter
configLoader domain.ConfigurationLoader
progress domain.ProgressReporter
}
func NewComplexityUseCaseBuilder() *ComplexityUseCaseBuilder
func (b *ComplexityUseCaseBuilder) WithService(service domain.ComplexityService) *ComplexityUseCaseBuilder
func (b *ComplexityUseCaseBuilder) WithFileReader(fileReader domain.FileReader) *ComplexityUseCaseBuilder
func (b *ComplexityUseCaseBuilder) Build() (*ComplexityUseCase, error)
// cmd/pyscn/complexity_clean.go - Dependency assembly
func (c *ComplexityCommand) createComplexityUseCase(cmd *cobra.Command) (*app.ComplexityUseCase, error) {
// Create services
fileReader := service.NewFileReader()
formatter := service.NewOutputFormatter()
configLoader := service.NewConfigurationLoader()
progress := service.CreateProgressReporter(cmd.ErrOrStderr(), 0, c.verbose)
complexityService := service.NewComplexityService(progress)
// Build use case with dependencies
return app.NewComplexityUseCaseBuilder().
WithService(complexityService).
WithFileReader(fileReader).
WithFormatter(formatter).
WithConfigLoader(configLoader).
WithProgress(progress).
Build()
}
Data Flow
1. Input Processing
Source File → Read → Tokenize → Parse → AST
2. Analysis Pipeline
AST → CFG Construction → Dead Code Analysis → Results
↘ ↗
APTED Analysis → Clone Detection →
3. Output Generation
Results → Aggregation → Formatting → Output (CLI/JSON/SARIF)
Performance Optimizations
1. Parallel Processing
- Parse multiple files concurrently
- Run independent analyses in parallel
- Use worker pools for large codebases
- Batch processing for clone detection
type WorkerPool struct {
workers int
jobs chan Job
results chan Result
waitGroup sync.WaitGroup
}
type BatchProcessor struct {
batchSize int
maxMemoryMB int
timeout time.Duration
}
2. LSH Acceleration
- Automatic LSH activation for large projects (>500 files)
- Two-stage detection: LSH candidates + APTED verification
- Configurable hash functions and banding parameters
- Early termination for dissimilar pairs
type LSHConfig struct {
Enabled string // "auto", "true", "false"
AutoThreshold int // Auto-enable threshold
SimilarityThreshold float64
Bands int
Rows int
Hashes int
}
3. Memory Management
- Stream large files instead of loading entirely
- Reuse AST nodes where possible
- Clear unused CFG blocks after analysis
- Use object pools for frequent allocations
- Memory-aware batch processing
4. Caching (Future Enhancement)
Note: Caching is not yet implemented in v1.0.0. This section describes the planned architecture for future releases.
Planned caching features:
- Cache parsed ASTs for unchanged files
- Store CFGs for incremental analysis
- Memoize APTED distance calculations
- LSH signature caching
// Planned implementation (not yet available)
type Cache struct {
ast map[string]*AST // File hash → AST
cfg map[string]*CFG // Function → CFG
dist map[string]float64 // Node pair → distance
lshSigs map[string][]uint64 // File → LSH signatures
}
Error Handling
Error Types
type ErrorType int
const (
ParseError ErrorType = iota
AnalysisError
ConfigError
IOError
)
type Error struct {
Type ErrorType
Message string
Location *Location
Cause error
}
Recovery Strategies
- Parse Errors: Skip problematic file, continue with others
- Analysis Errors: Report partial results, mark incomplete
- Config Errors: Use defaults, warn user
- IO Errors: Retry with backoff, then fail gracefully
Extension Points
1. Custom Analyzers
type Analyzer interface {
Name() string
Analyze(ast *AST) ([]Finding, error)
Configure(config map[string]interface{}) error
}
2. Output Formatters
type Formatter interface {
Format(findings []Finding) ([]byte, error)
Extension() string
ContentType() string
}
3. Language Support
type Language interface {
Name() string
Parse(source []byte) (*AST, error)
GetGrammar() *sitter.Language
}
Testing Strategy
pyscn follows a comprehensive testing approach with multiple layers of validation.
1. Unit Tests
Test individual components in isolation with dependency injection.
// domain/complexity_test.go - Domain entity tests
func TestOutputFormat(t *testing.T) {
tests := []struct {
name string
format OutputFormat
valid bool
}{
{"Text format", OutputFormatText, true},
{"JSON format", OutputFormatJSON, true},
{"Invalid format", OutputFormat("invalid"), false},
}
// Table-driven test implementation
}
// internal/analyzer/complexity_test.go - Algorithm tests
func TestCalculateComplexity(t *testing.T) {
tests := []struct {
name string
cfg *CFG
expected int
}{
{"Simple function", createSimpleCFG(), 1},
{"If statement", createIfCFG(), 2},
{"Nested conditions", createNestedCFG(), 4},
}
// Algorithm validation
}
Coverage: >80% across all packages Approach: Table-driven tests, dependency mocking, boundary condition testing
2. Integration Tests
Test layer interactions and workflows with real dependencies.
// integration/complexity_integration_test.go
func TestComplexityCleanFiltering(t *testing.T) {
// Create services (real implementations)
fileReader := service.NewFileReader()
outputFormatter := service.NewOutputFormatter()
configLoader := service.NewConfigurationLoader()
progressReporter := service.NewNoOpProgressReporter()
complexityService := service.NewComplexityService(progressReporter)
// Create use case with real dependencies
useCase := app.NewComplexityUseCase(
complexityService,
fileReader,
outputFormatter,
configLoader,
progressReporter,
)
// Test with real Python files and verify results
}
Scope: Service layer interactions, use case workflows, configuration loading
Data: Real Python code samples in testdata/
3. End-to-End Tests
Test complete user workflows through the CLI interface.
// e2e/complexity_e2e_test.go
func TestComplexityE2EBasic(t *testing.T) {
// Build actual binary
binaryPath := buildPyscnBinary(t)
defer os.Remove(binaryPath)
// Create test Python files
testDir := t.TempDir()
createTestPythonFile(t, testDir, "simple.py", pythonCode)
// Execute CLI command
cmd := exec.Command(binaryPath, "complexity", testDir)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
// Verify output and exit code
err := cmd.Run()
assert.NoError(t, err)
assert.Contains(t, stdout.String(), "simple_function")
}
Scenarios:
- Basic analysis with text output
- JSON format validation
- CLI flag parsing and validation
- Error handling (missing files, invalid arguments)
- Multiple file analysis
4. Command Interface Tests
Test CLI command structure and validation without full execution.
// cmd/pyscn/complexity_test.go
func TestComplexityCommandInterface(t *testing.T) {
complexityCmd := NewComplexityCommand()
cobraCmd := complexityCmd.CreateCobraCommand()
// Test command structure
assert.Equal(t, "complexity [files...]", cobraCmd.Use)
assert.NotEmpty(t, cobraCmd.Short)
// Test flags are properly configured
expectedFlags := []string{"format", "min", "max", "sort", "details"}
for _, flagName := range expectedFlags {
flag := cobraCmd.Flags().Lookup(flagName)
assert.NotNil(t, flag, "Flag %s should be defined", flagName)
}
}
5. Test Data Organization
testdata/
├── python/
│ ├── simple/ # Basic Python constructs
│ │ ├── functions.py # Simple function definitions
│ │ ├── classes.py # Class definitions
│ │ └── control_flow.py # Basic if/for/while
│ ├── complex/ # Complex code patterns
│ │ ├── exceptions.py # Try/except/finally
│ │ ├── async_await.py # Async/await patterns
│ │ └── comprehensions.py # List/dict comprehensions
│ └── edge_cases/ # Edge cases and errors
│ ├── nested_structures.py # Deep nesting
│ ├── syntax_errors.py # Invalid syntax
│ └── python310_features.py # Modern Python features
├── integration/ # Integration test fixtures
└── e2e/ # E2E test temporary files
6. Performance & Benchmark Tests
// internal/analyzer/complexity_benchmark_test.go
func BenchmarkComplexityCalculation(b *testing.B) {
cfg := createLargeCFG() // CFG with 1000+ nodes
b.ResetTimer()
for i := 0; i < b.N; i++ {
result := CalculateComplexity(cfg)
_ = result // Prevent compiler optimization
}
}
// Benchmark targets:
// - Parser performance: >100,000 lines/second
// - CFG construction: >10,000 lines/second
// - Complexity calculation: <1ms per function
7. Test Execution
# Run all tests
go test ./...
# Run with coverage
go test -cover ./...
# Run specific test suites
go test ./cmd/pyscn # Command interface tests
go test ./integration # Integration tests
go test ./e2e # End-to-end tests
# Run benchmarks
go test -bench=. ./internal/analyzer
8. Continuous Integration
All tests run automatically on:
- Go 1.24: Current version
- Go 1.25: Latest stable version (when available)
- Linux, macOS, Windows: Cross-platform compatibility
Quality Gates:
- All tests must pass
- Code coverage >80%
- No linting errors
- Build success on all platforms
Security Considerations
1. Input Validation
- Validate file paths
- Limit file sizes
- Sanitize configuration
- Check for path traversal
2. Resource Limits
- Cap memory usage
- Limit goroutines
- Timeout long operations
- Prevent infinite loops
3. Safe Parsing
- Handle malformed code
- Prevent parser exploits
- Validate AST depth
- Limit recursion
Development Progress & Roadmap
Phase 1 (MVP - Completed September 2025)
- Clean Architecture Implementation - Domain-driven design with dependency injection
- Tree-sitter Integration - Python parsing with go-tree-sitter
- CFG Construction - Control Flow Graph building for all Python constructs
- Complexity Analysis - McCabe cyclomatic complexity with risk assessment
- CLI Framework - Cobra-based command interface with multiple output formats
- Comprehensive Testing - Unit, integration, and E2E test suites
- CI/CD Pipeline - Automated testing on multiple Go versions and platforms
- Dead Code Detection - CFG-based unreachable code identification
- APTED Clone Detection - Tree edit distance for code similarity with LSH acceleration
- Configuration System - TOML-only configuration with hierarchical discovery
- CBO Analysis - Coupling Between Objects metrics
- Advanced Clone Grouping - Multiple algorithms (connected, star, complete linkage, k-core)
- HTML Reports - Rich web-based analysis reports
Future Roadmap (2026 and beyond)
Performance & Scalability (Q1 2026)
- Incremental Analysis - Only analyze changed files for faster CI/CD
- Distributed Processing - Multi-node analysis for enterprise codebases
- Enhanced Caching - Persistent analysis cache across runs
- Memory Optimizations - Further reduce memory footprint
Developer Experience (Q2 2026)
- VS Code Extension - Real-time analysis in editor with inline suggestions
- IDE Integrations - JetBrains, Vim, Emacs plugins
- Watch Mode - Continuous analysis during development
- Interactive CLI - TUI interface for exploring results
Advanced Analysis (Q3-Q4 2026)
- Type Inference Integration - Enhanced analysis with type information
- Semantic Clone Detection - Beyond structural similarity
- Auto-fix Capabilities - Automated refactoring suggestions
- Dependency Analysis - Import graph analysis and unused dependency detection
- Security Analysis - Static security vulnerability detection
Enterprise Features (2027+)
- Multi-language Support - JavaScript, TypeScript, Go, Rust analysis
- Cloud Analysis Service - SaaS offering for enterprise teams
- Team Analytics - Code quality trends and team insights
- LLM-powered Suggestions - AI-driven code improvement recommendations
Current Status (September 2025)
Completed Features:
- ✅ Full clean architecture with proper separation of concerns
- ✅ McCabe complexity analysis with configurable thresholds
- ✅ Multiple output formats (text, JSON, YAML, CSV, HTML)
- ✅ CLI with comprehensive flag support and validation
- ✅ Robust error handling with domain-specific error types
- ✅ Builder pattern for dependency injection
- ✅ Comprehensive test coverage (unit, integration, E2E)
- ✅ CI/CD pipeline with cross-platform testing
- ✅ Dead code detection with CFG analysis
- ✅ APTED clone detection with LSH acceleration
- ✅ CBO (Coupling Between Objects) analysis
- ✅ Advanced clone grouping algorithms
- ✅ Unified analyze command with HTML reports
Recently Completed:
- ✅ TOML-only configuration system (.pyscn.toml, pyproject.toml)
- ✅ LSH-based clone detection acceleration for large projects
- ✅ Multiple grouping modes (connected, star, complete linkage, k-core)
- ✅ Performance optimizations and batch processing
Performance Benchmarks:
- Parser: >100,000 lines/second ✅
- CFG Construction: >25,000 lines/second ✅
- Complexity Calculation: <0.1ms per function ✅
- Clone Detection: >10,000 lines/second with LSH acceleration ✅
- LSH Candidate Generation: >500,000 functions/second ✅
Dependencies
Core Dependencies
// go.mod
require (
github.com/smacker/go-tree-sitter v0.0.0-20240827094217-dd81d9e9be82
github.com/spf13/cobra v1.9.1
github.com/spf13/viper v1.20.1
github.com/pelletier/go-toml/v2 v2.2.3
github.com/stretchr/testify v1.10.0
)
Development Dependencies
require (
github.com/stretchr/testify v1.8.4
github.com/golangci/golangci-lint v1.55.2
golang.org/x/tools v0.17.0
)
Configuration Examples
Basic Configuration
# .pyscn.toml
[dead_code]
enabled = true
min_severity = "warning"
show_context = false
[clones]
min_lines = 5
similarity_threshold = 0.8
lsh_enabled = "auto"
[output]
format = "text"
sort_by = "name"
[analysis]
exclude_patterns = [
"test_*.py",
"*_test.py",
"**/migrations/**"
]
Advanced Configuration
# .pyscn.toml or pyproject.toml [tool.pyscn] section
[dead_code]
enabled = true
min_severity = "warning"
show_context = true
context_lines = 3
ignore_patterns = ["__all__", "_*"]
[clones]
min_lines = 10
min_nodes = 20
similarity_threshold = 0.7
type1_threshold = 0.98
type2_threshold = 0.95
type3_threshold = 0.85
type4_threshold = 0.70
max_results = 1000
# LSH acceleration for large projects
[clones.lsh]
enabled = "auto"
auto_threshold = 500
similarity_threshold = 0.78
bands = 32
rows = 4
hashes = 128
# Clone grouping algorithms
[clones.grouping]
mode = "connected" # connected | star | complete_linkage | k_core
threshold = 0.85
k_core_k = 2
[complexity]
enabled = true
low_threshold = 9
medium_threshold = 19
max_complexity = 0
[cbo]
enabled = true
low_threshold = 5
medium_threshold = 10
include_builtins = false
[output]
format = "html"
directory = "reports"
show_details = true
[analysis]
recursive = true
include_patterns = ["src/**/*.py", "lib/**/*.py"]
exclude_patterns = [
"test_*.py",
"*_test.py",
"**/migrations/**",
"**/__pycache__/**"
]
Metrics and Monitoring
Analysis Metrics
- Files analyzed
- Lines processed
- Findings detected
- Analysis duration
- Memory peak usage
Quality Metrics
- False positive rate
- Detection accuracy
- Performance benchmarks
- User satisfaction
Telemetry (Optional)
type Telemetry struct {
Version string
OS string
Arch string
FileCount int
LineCount int
Duration time.Duration
Findings map[string]int
}
Conclusion
This architecture provides a solid foundation for a high-performance Python static analysis tool. The modular design allows for easy extension and maintenance, while the performance optimizations ensure scalability to large codebases.