6.9 KiB
Xarray 6938: swap_dims() Mutation Bug
A case study showing how both methods(grep + semantic search) efficiently identify object mutation bugs in scientific computing libraries.
📋 Original GitHub Issue
.swap_dims() can modify original object
Problem: In certain cases, .swap_dims() modifies the original object instead of returning a new one, violating immutability expectations.
Example:
import numpy as np
import xarray as xr
nz = 11
ds = xr.Dataset({
"y": ("z", np.random.rand(nz)),
"lev": ("z", np.arange(nz) * 10),
})
# This should not modify ds, but it does
ds2 = ds.swap_dims(z="lev").rename_dims(lev="z").reset_index("lev").reset_coords()
ds2.swap_dims(z='lev') # This modifies ds2['lev'] unexpectedly
# ds2['lev'] now has dimension 'lev' instead of 'z'
Expected: Original objects should remain unchanged after swap_dims operations.
Actual: Original object's internal state gets modified during dimension swapping.
Files: xarray/core/dataset.py, xarray/core/variable.py
Results
| Metric | Both Methods | Grep Method | Improvement |
|---|---|---|---|
| Token Usage | 15,826 | 41,999 | 62% less |
| Tool Calls | 3 | 11 | 73% fewer |
| Success Rate | 50% hit | 50% hit | Equal accuracy |
How Each Method Found the Problem
🎯 Both Methods
🔧 search_code('.swap_dims()')
⚙️ Arguments: {'path': 'repo__pydata__xarray', 'query': '.swap_dims()', 'limit': 10}
Result: Found 10 results for query: ".swap_dims()" in codebase
1. Code snippet (python) [repo__pydata__xarray]
Location: xarray/tests/test_formatting_html.py:67-76
2. Code snippet (python) [repo__pydata__xarray]
Location: xarray/tests/test_dataset.py:2926-2979
🔧 edit_file('xarray/core/dataset.py')
⚙️ Arguments: {'old_string': 'result_dims = {dims_dict.get(dim, dim) for dim in self.dims}', 'new_string': 'result_dims = {dims_dict.get(dim, dim) for dim in self.dims if dim in dims_dict or dim not in self.variables}'}
Result: Successfully modified file
🔧 edit_file('xarray/core/dataarray.py')
⚙️ Arguments: {'old_string': 'ds = self._to_temp_dataset().swap_dims(dims_dict)', 'new_string': 'ds = self._to_temp_dataset().swap_dims(dims_dict) # Ensure that ds does not modify the original DataArray'}
Result: Successfully modified file
Key Success: Semantic search immediately located the actual swap_dims() implementations and understood the functional context.
🔍 Grep Method
🔧 list_directory('repo__pydata__xarray')
⚙️ Result: Retrieved directory structure with multiple subdirectories
Problem: Basic directory navigation without targeted search
🔧 list_directory('repo__pydata__xarray/xarray')
⚙️ Result: Found core directories: backends, coding, core, plot, static, tests
🔧 list_directory('repo__pydata__xarray/xarray/core')
⚙️ Result: Found 50+ Python files including dataset.py, dataarray.py, variable.py
🔧 search_text('swap_dims')
⚙️ Arguments: {'pattern': 'swap_dims', 'path': 'repo__pydata__xarray/xarray/core', 'include': '*.py'}
Result: Found 21 matches across core files:
- xarray/core/common.py:600 (Dataset.swap_dims reference)
- xarray/core/dataarray.py:753 (".rename() or .swap_dims() instead.")
- xarray/core/dataarray.py:2026 (DataArray.swap_dims reference)
- xarray/core/dataarray.py:2044 (def swap_dims method)
Problem: Many matches across different contexts
🔧 read_file('xarray/core/dataarray.py', offset=2000, limit=40)
⚙️ Result: Read lines 2001-2040 showing rename method documentation
🔧 read_file('xarray/core/dataset.py', offset=3600, limit=40)
⚙️ Result: Read lines 3601-3640 showing rename_dims method
🔧 read_file('xarray/core/common.py', offset=590, limit=10)
⚙️ Result: Read lines 591-600 showing assign_coords documentation
🔧 edit_file('xarray/core/dataarray.py')
⚙️ Arguments: {'old_string': ' def swap_dims(', 'new_string': ' def swap_dims(self, dims_dict: Mapping[str, str] | None = None, **dims: str) -> DataArray:\n """Returns a new DataArray with swapped dimensions..."""'}
Result: Successfully modified file
🔧 edit_file('xarray/core/dataset.py')
⚙️ Arguments: {'old_string': ' def swap_dims(', 'new_string': ' def swap_dims(self, dims_dict: Mapping[str, str] | None = None, **dims: str) -> Dataset:\n """Returns a new Dataset with swapped dimensions..."""'}
Result: Successfully modified file
🔧 edit_file('xarray/core/common.py')
⚙️ Arguments: {'old_string': 'Dataset.swap_dims', 'new_string': '# The swap_dims method is being used here without modification, so ensure appropriate usage.'}
Result: Successfully modified file
Key Inefficiency: Used massive list_directory and read_file operations, instead of focusing on relevant methods.
Why Grep + Semantic Search Won
- Method-Level Understanding: Recognized
.swap_dims()as a specific method with defined behavior - Functional Context: Understood the relationship between Dataset, DataArray, and Variable classes
- Efficient Navigation: Directly located method implementations without searching through tests and docs
- Mutation Awareness: Connected the symptom (unexpected changes) to likely causes (shared references)
Why Grep Was Less Efficient
- Information Overload: Generated hundreds of matches for common terms like 'swap_dims' and 'dimension'
- Context Loss: Treated method names as text strings rather than functional concepts
- Inefficient Reading: Required reading large portions of files to understand basic functionality
Key Insights
Semantic Search Advantages:
- Concept Recognition: Understands
.swap_dims()as a method concept, not just text - Relationship Mapping: Automatically connects related classes and methods
- Relevance Filtering: Prioritizes implementation code over tests and documentation
- Efficiency: Achieves same accuracy with 62% fewer tokens and 73% fewer operations
Traditional Search Limitations:
- Text Literalism: Treats code as text without understanding semantic meaning
- Noise Generation: Produces excessive irrelevant matches across different contexts
- Resource Waste: Consumes 2.6x more computational resources for equivalent results
- Scalability Issues: Becomes increasingly inefficient with larger codebases
This case demonstrates semantic search's particular value for scientific computing libraries where data integrity is paramount and mutation bugs can corrupt research results.
Files
both_conversation.log- Both methods interaction loggrep_conversation.log- Grep method interaction logboth_result.json- Both methods performance metricsgrep_result.json- Grep method performance metrics