Vajra

Break noise. Preserve truth.

A high-performance Rust CLI and library that analyzes arbitrary structured data — JSON, YAML, CSV, NDJSON, Markdown, PDF, git repos, CPU profiles, strace logs — extracts structural and semantic signal, detects anomalies and drift, discovers temporal cause-effect chains, and emits compact deterministic essences optimized for humans, auditors, and AI pipelines.

Install

cargo install vajra-cli

Or from source:

git clone https://github.com/copyleftdev/vajra
cd vajra
cargo build --release

30 Seconds to Value

# Structural analysis
vajra inspect data.json

# Concern-oriented essence for non-technical staff
vajra essence data.json --profile staff

# Anomaly detection
vajra anomalies data.json

# Schema drift between versions
vajra drift v1.json v2.json

# Compact output for LLM consumption
vajra essence data.json --profile ai --format compact-ai --budget 500

# Query with analysis functions
vajra query data.json 'entropy($.claims[*].status) > 0.5'

# Batch analysis with parallel processing
vajra batch data_directory/

# Cluster similar documents
vajra cluster batch/*.json

# Analyze a git repo directly
vajra stats /path/to/repo --input-format git

# Temporal cause-effect chains
vajra cascade commits.json --entity-field '$.file' --time-field '$.date' --event-field '$.intent'

# Time-series windowed stats with trend detection
vajra stats data.json --window month --time-field '$.date'

# Population-level drift comparison
vajra drift data.json --group-by '$.team'

# Semantic path labels for source code
vajra inspect src/main.rs --input-format source --lang rust --semantic-paths

What It Does

Feed Vajra any structured data. It returns shape, signal, anomalies, and truth.

Command	Purpose
`inspect`	Full structural analysis — paths, types, fingerprints, domain recognition
`stats`	Statistical summary — entropy, frequency, numeric distributions
`anomalies`	MAD-based outliers, rarity scoring, type instability
`fingerprint`	BLAKE3 structural hashes, Merkle motifs
`essence`	Concern-oriented reduction — 7 profiles, token budgets, compact-AI
`drift`	Schema drift — JSD, Wasserstein, severity classification
`cluster`	MinHash + LSH similarity clustering
`invariants`	Cross-field relationships — conditional entropy, PMI
`query`	Path expressions with analysis functions
`cascade`	Temporal cause-effect chain detection across events
`batch`	Parallel batch analysis across directories
`profiles`	List available profiles

Input Formats

Format	Extensions	Auto-Detected
JSON	`.json`	Yes
NDJSON	`.ndjson`, `.jsonl`	Yes
YAML	`.yaml`, `.yml`	Yes
CSV	`.csv`	Yes
TSV	`.tsv`	Yes
Markdown	`.md`, `.markdown`	Yes
PDF	`.pdf`	Yes
Gzip	`.gz`, `.json.gz`	Yes (magic bytes)
Zstd	`.zst`, `.zstd`	Yes (magic bytes)
Git	`--input-format git`	No (explicit)
V8 CPU Profile	`.cpuprofile`	Yes
strace	`--input-format strace`	No (explicit)
HTTP	`http://`, `https://`	Yes
Stdin	`-`	Yes

Profiles

Profile	Emphasizes	Audience
`staff`	Anomalies, structural coverage	Non-technical operations
`engineer`	Type instability, balanced	Developers
`auditor`	Completeness, traceability	Compliance
`ai`	Entropy, coverage, compact output	LLM pipelines
`fraud`	Outliers, rarity, suspicious patterns	Investigation
`health`	Velocity, staleness, bus factor	Project health assessment
Custom	Your weights, your rules	TOML configuration

Domain Plugins

Plugin	Recognizers	Covers
`vajra-domain-med`	ICD-10, CPT, NPI, NDC	Medical/EDI
`vajra-domain-sec`	CVE, MITRE ATT&CK, IPs, hashes, JWT	Security
`vajra-domain-devops`	K8s, Docker, Terraform, ARN, semver	DevOps/Infra
`vajra-domain-github`	PRs, issues, reviews, releases, commits, labels, milestones, deployments, check runs, actions	GitHub
`vajra-domain-source`	Naming conventions, file paths	Source code
`vajra-domain-encoding`	Base64, hex, URL, PEM, layers	Encoding

Global Options

Flag	Applies To	Purpose
`--window month/week/day`	`stats`	Temporal windowing with trend detection
`--time-field '$.path'`	`stats`, `cascade`	JSONPath to the timestamp field
`--group-by '$.path'`	`drift`	Partition records by field for population-level comparison
`--git-limit N`	`--input-format git`	Cap number of git commits ingested
`--git-branch name`	`--input-format git`	Read from a specific branch
`--semantic-paths`	`inspect`	Map tree-sitter AST paths to human-readable labels (9 languages)

The Engine

Every algorithm was chosen against three gates: works at any scale (O(n) or O(n log n)), battle-tested (published, peer-reviewed), and deterministic (same input = same output, always).

Algorithm	Purpose	Provenance
BLAKE3	All hashing and fingerprinting	O'Connor et al. 2020
Merkle subtree hashing	Structural identity + motif detection	O(n), motifs for free
Shannon entropy	Value diversity measurement	Universal signal primitive
Renyi entropy spectrum	Diversity profiling (H0, H1, H2, H_inf)	Renyi 1961
Lempel-Ziv complexity	Structural complexity beyond entropy	Lempel & Ziv 1976
Transfer entropy	Directed information flow / causality	Schreiber 2000
Total correlation	Multivariate dependency measurement	Watanabe 1960
NCD	Universal similarity via compression	Li et al. 2004
MAD	Robust outlier detection	50% breakdown point
DDSketch	Streaming quantile estimation	Masson et al. 2019 (Datadog)
Count-Min Sketch	Streaming frequency estimation	Cormode & Muthukrishnan 2005
Jensen-Shannon Divergence	Distribution drift measurement	Endres & Schindelin 2003
MinHash + LSH	Scalable similarity clustering	Broder 1997, Indyk & Motwani 1998

Testing

1536 tests, 0 failures

64 property tests — every mathematical invariant encoded
68 chaos tests — pathological inputs, no panics
11 differential tests — exact vs streaming equivalence
25 determinism tests — 10-run byte-identical verification
 6 golden tests — regression gate against 31-file corpus
 8 criterion benchmark suites

cargo test --workspace                    # all tests
cargo test -- prop_                       # property tests only
cargo test -- chaos                       # chaos tests only
cargo test -- determinism                 # determinism verification
cargo test -- golden                      # golden regression tests
cargo bench --workspace                   # benchmarks

Architecture

vajra/
├── vajra-types        # Shared types, traits, scoring
├── vajra-core         # Parsing, paths, canonicalization, streaming, formats, redaction
├── vajra-fingerprint  # BLAKE3, Merkle, MinHash, LSH, NCD, clustering
├── vajra-stats        # Entropy, Renyi, LZ complexity, transfer entropy, total correlation, MAD, DDSketch, CMS, Benford, temporal, relationships
├── vajra-anomaly      # Outlier detection, rarity, type instability
├── vajra-drift        # JSD, Wasserstein, path diff, severity
├── vajra-essence      # Profiles, scoring, rendering, TOML config, chunking
├── vajra-query        # Expression parser, analysis functions
├── vajra-cascade      # Temporal cause-effect chain detection
├── vajra-domain-med   # Medical/EDI plugin (ICD-10, CPT, NPI, NDC)
├── vajra-domain-sec   # Security plugin (CVE, MITRE ATT&CK, IPs, hashes, JWT)
├── vajra-domain-devops # DevOps plugin (K8s, Docker, Terraform, ARN, semver)
├── vajra-domain-github # GitHub plugin (PRs, issues, reviews, releases, actions)
├── vajra-source       # Source code parsing via tree-sitter (9 languages)
├── vajra-domain-source # Source code recognizers (naming conventions, paths)
├── vajra-domain-encoding # Encoding detection (Base64, hex, URL, PEM, layers)
├── vajra-motif        # (reserved)
├── vajra-cli          # CLI commands, batch processing
└── docs/              # mdbook documentation site

Documentation

Full documentation with GSAP-powered kinetic showcase:

cd docs && mdbook serve --open

License

Licensed under either of:

at your option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vajra

Install

30 Seconds to Value

What It Does

Input Formats

Profiles

Domain Plugins

Global Options

The Engine

Testing

Architecture

Documentation

License

About

Licenses found

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
corpus		corpus
docs		docs
man		man
vajra-anomaly		vajra-anomaly
vajra-cascade		vajra-cascade
vajra-cli		vajra-cli
vajra-core		vajra-core
vajra-domain-devops		vajra-domain-devops
vajra-domain-encoding		vajra-domain-encoding
vajra-domain-github		vajra-domain-github
vajra-domain-med		vajra-domain-med
vajra-domain-sec		vajra-domain-sec
vajra-domain-source		vajra-domain-source
vajra-drift		vajra-drift
vajra-essence		vajra-essence
vajra-fingerprint		vajra-fingerprint
vajra-mcp		vajra-mcp
vajra-motif		vajra-motif
vajra-query		vajra-query
vajra-report		vajra-report
vajra-source		vajra-source
vajra-stats		vajra-stats
vajra-types		vajra-types
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
logo.svg		logo.svg
prd.md		prd.md

Folders and files

Latest commit

History

Repository files navigation

Vajra

Install

30 Seconds to Value

What It Does

Input Formats

Profiles

Domain Plugins

Global Options

The Engine

Testing

Architecture

Documentation

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages