Metadata-Version: 2.4
Name: chromperiod
Version: 1.0.0
Summary: Detecting periodic chromatin organization from consecutive accessibility peaks
Home-page: https://github.com/wgebhardt/chromperiod
Author: Wolf H. Gebhardt
Author-email: "Wolf H. Gebhardt" <w.gebhardt@protonmail.com>
License: MIT
Project-URL: Homepage, https://github.com/wgebhardt/chromperiod
Project-URL: Documentation, https://chromperiod.readthedocs.io
Project-URL: Repository, https://github.com/wgebhardt/chromperiod
Project-URL: Bug Tracker, https://github.com/wgebhardt/chromperiod/issues
Keywords: chromatin,wavelet,CWT,ATAC-seq,DNase-seq,A/B compartments,periodicity,Paul wavelet,genomics
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: matplotlib>=3.5
Requires-Dist: pandas>=1.3
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: seaborn>=0.11; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Provides-Extra: full
Requires-Dist: seaborn>=0.11; extra == "full"
Requires-Dist: pyBigWig; extra == "full"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# chromperiod: Detecting periodic chromatin organization from consecutive accessibility peaks

[![PyPI version](https://badge.fury.io/py/chromperiod.svg)](https://badge.fury.io/py/chromperiod)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/wgebhardt/chromperiod/actions/workflows/tests.yml/badge.svg)](https://github.com/wgebhardt/chromperiod/actions)

`chromperiod` is a Python package implementing the consecutive-peaks continuous wavelet transform (CWT) analysis pipeline for detecting periodic chromatin accessibility patterns from standard DNase-seq, ATAC-seq, or ChIP-seq narrowPeak files — without requiring chromosome conformation capture (Hi-C) data.

The method reveals that chromatin accessibility encodes periodic compartmental organization at the multi-megabase scale, detectable from widely available accessibility data. This package implements the analytical framework described in:

> Gebhardt WH (2026) Periodic chromatin waves reveal a scaling law of chromosome organization. *Nature* [under review].

The methods preprint describing the `chromperiod` package itself is:

> Gebhardt WH (2026) chromperiod: a Python package for detecting periodic chromatin organization from consecutive accessibility peaks. *bioRxiv* [DOI to be assigned].

---

## Installation

```bash
pip install chromperiod
```

Or from source:

```bash
git clone https://github.com/wgebhardt/chromperiod
cd chromperiod
pip install -e .
```

**Requirements:** Python ≥ 3.9, numpy ≥ 1.21, scipy ≥ 1.7, matplotlib ≥ 3.5, pandas ≥ 1.3

---

## Quickstart

```python
from chromperiod import load_narrowpeak, run_chromosome, run_iaaft_null
from chromperiod.plotting import plot_gws

# Load K562 ATAC-seq peaks (ENCODE ENCFF361VGY)
df = load_narrowpeak("ENCFF361VGY.narrowPeak.gz")

# Run CWT on chromosome 1 (Paul m=2, the Nature manuscript canonical)
result = run_chromosome(df, chromosome="chr1")
print(result)
# CWTResult(chrom=chr1, n_peaks=..., dominant_period=... Mbp, sig95=..., ar1=...)

# Run IAAFT-1000 null (primary null model)
result = run_iaaft_null(result, n_surrogates=1000, seed=42)
print(f"IAAFT sig95: {result.iaaft_sig95:.1%}")

# Plot GWS with IAAFT null overlay
fig, ax = plot_gws(result, period_units="mbp", show_iaaft_null=True)
fig.savefig("gws_chr1.png", dpi=300, bbox_inches="tight", pad_inches=0.05)
```

---

## Method

### The consecutive-peak representation

Standard spectral analysis of chromatin accessibility is confounded by genomic autocorrelation: gene-rich regions cluster accessible sites while heterochromatic regions are sparse, creating a dominant low-frequency trend that masks periodic structure. The consecutive-peak representation resolves this by treating the ordered sequence of peak scores — indexed by position in the genome, not by genomic coordinate — as the input signal for CWT analysis.

This representation performs a nonlinear compression of the genome:
- Gene-poor heterochromatic stretches (few peaks) → collapsed
- Gene-rich euchromatic regions (many peaks) → expanded

The result is a 4-fold reduction in autocorrelation (AR1 ≈ 0.15 vs ≈ 0.62 in coordinate space) that exposes periodic fluctuations in chromatin state otherwise masked by the coordinate-space trend.

### Paul m=2 wavelet (primary)

The package uses the Paul wavelet of order m=2 (Torrence & Compo 1998) as the primary analysis wavelet:

- **Fourier factor**: λ = 4π/(2m+1) = 4π/5 ≈ 2.513
- **COI factor**: λ/√2 ≈ 1.777
- **Power normalization**: |W_n(s)|²/σ² (no division by scale)
- **GWS**: time-average of power outside cone of influence (T&C eq. 7)

> **Note on wavelet order**: Paul m=4 is available as `wavelet='paul', order=4` for backward compatibility with earlier MCF-7 analyses, but is **NOT** the Nature manuscript canonical. Always use the default `order=2` for new analyses.

### IAAFT-1000 null model (primary)

The Iterative Amplitude-Adjusted Fourier Transform (IAAFT) null preserves both the amplitude distribution AND the power spectrum of the original signal — unlike simple permutation (amplitude only) or phase-randomization (power spectrum only). This makes it the appropriate null for detecting periodicity in signals with non-Gaussian amplitude distributions and colored power spectra.

```python
# Primary null (manuscript canonical)
result = run_iaaft_null(result, n_surrogates=1000)

# Faster approximations (not manuscript canonical — use for exploration only)
ar1_null = run_ar1_null(result)        # AR(1) analytical threshold
perm_null = run_permutation_null(result)  # Simple permutation
```

---

## Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `wavelet` | `'paul'` | Wavelet type: `'paul'` (primary), `'morlet'`, `'dog'` |
| `order` | `2` | Wavelet order. **m=2 is the Nature manuscript canonical.** m=4 is legacy. |
| `n_scales` | `80` | Number of log-spaced scales |
| `period_min` | `10` | Minimum period in peak-index units |
| `period_max` | `7000` | Maximum period in peak-index units |
| `significance_level` | `0.95` | AR1 significance level (for AR1 null only) |
| `n_surrogates` | `1000` | Number of IAAFT surrogates |

---

## Output: CWTResult

| Attribute | Description |
|-----------|-------------|
| `.power` | Wavelet power matrix (n_scales × n_peaks) |
| `.gws` | Global wavelet spectrum |
| `.periods` | Period array in peak-index units |
| `.periods_mbp` | Period array in Mbp |
| `.dominant_period` | GWS-peak dominant period (peak-index units) |
| `.dominant_period_mbp` | GWS-peak dominant period in Mbp |
| `.sig95` | Fraction of significant area outside COI (AR1 null) |
| `.ar1_alpha` | Lag-1 autocorrelation |
| `.iaaft_sig95` | Fraction significant vs IAAFT-1000 null (primary) |
| `.iaaft_surr_p95` | 95th percentile of IAAFT surrogate GWS |
| `.iaaft_pvalues` | Phipson-Smyth p-values per scale |
| `.coi` | Cone of influence array |
| `.coi_frac` | COI-accessible fraction |
| `.dom_period_near_coi_edge` | True if dominant period is near COI boundary |

---

## Example notebooks

Three reproducible Jupyter notebooks are provided in `examples/`. All use real public data — no synthetic results.

| Notebook | Data | Description |
|----------|------|-------------|
| `01_single_chromosome_demo.ipynb` | K562 ENCFF361VGY | Full pipeline walkthrough on chr1 |
| `02_genome_wide_demo.ipynb` | K562 ENCFF361VGY | All 22 autosomes, GWS summary |
| `03_scaling_law_demo.ipynb` | K562 + GM12878 | Cross-cell-line scaling fit |

See `examples/example_data/README.md` for download instructions.

---

## Validation

The package ships with 14 unit tests covering:

- White noise false-positive rate: mean sig95 = 5% ± 1.2% (target: [2%, 10%])
- Sine wave period recovery: within 15% of true period at SNR=2
- IAAFT surrogate properties: amplitude distribution and power spectrum preserved
- Cross-wavelet robustness: scaling exponent b = 0.77–0.80 across Paul m=2/m=4, Morlet, DOG
- Round-trip validation: reproduces v38 K562 canonical numbers on all 22 autosomes

```bash
# Run all tests (excluding network-dependent canonical roundtrip)
pytest tests/ -v -m "not network"

# Run canonical roundtrip (requires ENCFF361VGY cached or network access)
pytest tests/test_v38_canonical_roundtrip.py -v -m network
```

---

## Citation

If you use `chromperiod` in your research, please cite:

```bibtex
@article{gebhardt2026chromperiod,
  author  = {Gebhardt, Wolf H.},
  title   = {chromperiod: a Python package for detecting periodic chromatin
             organization from consecutive accessibility peaks},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.1101/2026.XX.XX.XXXXXX}
}
```

and the underlying wavelet method:

```bibtex
@article{torrence1998,
  author  = {Torrence, Christopher and Compo, Gilbert P.},
  title   = {A practical guide to wavelet analysis},
  journal = {Bulletin of the American Meteorological Society},
  volume  = {79},
  number  = {1},
  pages   = {61--78},
  year    = {1998}
}
```

---

## License

MIT License. Copyright (c) 2026 Wolf H. Gebhardt.

Example data notebooks reference public ENCODE data under CC-BY-4.0.
