Break noise. Preserve truth.
A high-performance Rust CLI and library that analyzes arbitrary structured data — JSON, YAML, CSV, NDJSON, Markdown, PDF, git repos, CPU profiles, strace logs — extracts structural and semantic signal, detects anomalies and drift, discovers temporal cause-effect chains, and emits compact deterministic essences optimized for humans, auditors, and AI pipelines.
cargo install vajra-cliOr from source:
git clone https://github.com/copyleftdev/vajra
cd vajra
cargo build --release# Structural analysis
vajra inspect data.json
# Concern-oriented essence for non-technical staff
vajra essence data.json --profile staff
# Anomaly detection
vajra anomalies data.json
# Schema drift between versions
vajra drift v1.json v2.json
# Compact output for LLM consumption
vajra essence data.json --profile ai --format compact-ai --budget 500
# Query with analysis functions
vajra query data.json 'entropy($.claims[*].status) > 0.5'
# Batch analysis with parallel processing
vajra batch data_directory/
# Cluster similar documents
vajra cluster batch/*.json
# Analyze a git repo directly
vajra stats /path/to/repo --input-format git
# Temporal cause-effect chains
vajra cascade commits.json --entity-field '$.file' --time-field '$.date' --event-field '$.intent'
# Time-series windowed stats with trend detection
vajra stats data.json --window month --time-field '$.date'
# Population-level drift comparison
vajra drift data.json --group-by '$.team'
# Semantic path labels for source code
vajra inspect src/main.rs --input-format source --lang rust --semantic-pathsFeed Vajra any structured data. It returns shape, signal, anomalies, and truth.
| Command | Purpose |
|---|---|
inspect |
Full structural analysis — paths, types, fingerprints, domain recognition |
stats |
Statistical summary — entropy, frequency, numeric distributions |
anomalies |
MAD-based outliers, rarity scoring, type instability |
fingerprint |
BLAKE3 structural hashes, Merkle motifs |
essence |
Concern-oriented reduction — 7 profiles, token budgets, compact-AI |
drift |
Schema drift — JSD, Wasserstein, severity classification |
cluster |
MinHash + LSH similarity clustering |
invariants |
Cross-field relationships — conditional entropy, PMI |
query |
Path expressions with analysis functions |
cascade |
Temporal cause-effect chain detection across events |
batch |
Parallel batch analysis across directories |
profiles |
List available profiles |
| Format | Extensions | Auto-Detected |
|---|---|---|
| JSON | .json |
Yes |
| NDJSON | .ndjson, .jsonl |
Yes |
| YAML | .yaml, .yml |
Yes |
| CSV | .csv |
Yes |
| TSV | .tsv |
Yes |
| Markdown | .md, .markdown |
Yes |
.pdf |
Yes | |
| Gzip | .gz, .json.gz |
Yes (magic bytes) |
| Zstd | .zst, .zstd |
Yes (magic bytes) |
| Git | --input-format git |
No (explicit) |
| V8 CPU Profile | .cpuprofile |
Yes |
| strace | --input-format strace |
No (explicit) |
| HTTP | http://, https:// |
Yes |
| Stdin | - |
Yes |
| Profile | Emphasizes | Audience |
|---|---|---|
staff |
Anomalies, structural coverage | Non-technical operations |
engineer |
Type instability, balanced | Developers |
auditor |
Completeness, traceability | Compliance |
ai |
Entropy, coverage, compact output | LLM pipelines |
fraud |
Outliers, rarity, suspicious patterns | Investigation |
health |
Velocity, staleness, bus factor | Project health assessment |
| Custom | Your weights, your rules | TOML configuration |
| Plugin | Recognizers | Covers |
|---|---|---|
vajra-domain-med |
ICD-10, CPT, NPI, NDC | Medical/EDI |
vajra-domain-sec |
CVE, MITRE ATT&CK, IPs, hashes, JWT | Security |
vajra-domain-devops |
K8s, Docker, Terraform, ARN, semver | DevOps/Infra |
vajra-domain-github |
PRs, issues, reviews, releases, commits, labels, milestones, deployments, check runs, actions | GitHub |
vajra-domain-source |
Naming conventions, file paths | Source code |
vajra-domain-encoding |
Base64, hex, URL, PEM, layers | Encoding |
| Flag | Applies To | Purpose |
|---|---|---|
--window month/week/day |
stats |
Temporal windowing with trend detection |
--time-field '$.path' |
stats, cascade |
JSONPath to the timestamp field |
--group-by '$.path' |
drift |
Partition records by field for population-level comparison |
--git-limit N |
--input-format git |
Cap number of git commits ingested |
--git-branch name |
--input-format git |
Read from a specific branch |
--semantic-paths |
inspect |
Map tree-sitter AST paths to human-readable labels (9 languages) |
Every algorithm was chosen against three gates: works at any scale (O(n) or O(n log n)), battle-tested (published, peer-reviewed), and deterministic (same input = same output, always).
| Algorithm | Purpose | Provenance |
|---|---|---|
| BLAKE3 | All hashing and fingerprinting | O'Connor et al. 2020 |
| Merkle subtree hashing | Structural identity + motif detection | O(n), motifs for free |
| Shannon entropy | Value diversity measurement | Universal signal primitive |
| Renyi entropy spectrum | Diversity profiling (H0, H1, H2, H_inf) | Renyi 1961 |
| Lempel-Ziv complexity | Structural complexity beyond entropy | Lempel & Ziv 1976 |
| Transfer entropy | Directed information flow / causality | Schreiber 2000 |
| Total correlation | Multivariate dependency measurement | Watanabe 1960 |
| NCD | Universal similarity via compression | Li et al. 2004 |
| MAD | Robust outlier detection | 50% breakdown point |
| DDSketch | Streaming quantile estimation | Masson et al. 2019 (Datadog) |
| Count-Min Sketch | Streaming frequency estimation | Cormode & Muthukrishnan 2005 |
| Jensen-Shannon Divergence | Distribution drift measurement | Endres & Schindelin 2003 |
| MinHash + LSH | Scalable similarity clustering | Broder 1997, Indyk & Motwani 1998 |
1536 tests, 0 failures
64 property tests — every mathematical invariant encoded
68 chaos tests — pathological inputs, no panics
11 differential tests — exact vs streaming equivalence
25 determinism tests — 10-run byte-identical verification
6 golden tests — regression gate against 31-file corpus
8 criterion benchmark suites
cargo test --workspace # all tests
cargo test -- prop_ # property tests only
cargo test -- chaos # chaos tests only
cargo test -- determinism # determinism verification
cargo test -- golden # golden regression tests
cargo bench --workspace # benchmarksvajra/
├── vajra-types # Shared types, traits, scoring
├── vajra-core # Parsing, paths, canonicalization, streaming, formats, redaction
├── vajra-fingerprint # BLAKE3, Merkle, MinHash, LSH, NCD, clustering
├── vajra-stats # Entropy, Renyi, LZ complexity, transfer entropy, total correlation, MAD, DDSketch, CMS, Benford, temporal, relationships
├── vajra-anomaly # Outlier detection, rarity, type instability
├── vajra-drift # JSD, Wasserstein, path diff, severity
├── vajra-essence # Profiles, scoring, rendering, TOML config, chunking
├── vajra-query # Expression parser, analysis functions
├── vajra-cascade # Temporal cause-effect chain detection
├── vajra-domain-med # Medical/EDI plugin (ICD-10, CPT, NPI, NDC)
├── vajra-domain-sec # Security plugin (CVE, MITRE ATT&CK, IPs, hashes, JWT)
├── vajra-domain-devops # DevOps plugin (K8s, Docker, Terraform, ARN, semver)
├── vajra-domain-github # GitHub plugin (PRs, issues, reviews, releases, actions)
├── vajra-source # Source code parsing via tree-sitter (9 languages)
├── vajra-domain-source # Source code recognizers (naming conventions, paths)
├── vajra-domain-encoding # Encoding detection (Base64, hex, URL, PEM, layers)
├── vajra-motif # (reserved)
├── vajra-cli # CLI commands, batch processing
└── docs/ # mdbook documentation site
Full documentation with GSAP-powered kinetic showcase:
cd docs && mdbook serve --openLicensed under either of:
at your option.