Skip to content

Forward and reverse Enzyme tests and rules for linalg#449

Open
kshyatt wants to merge 22 commits into
mainfrom
ksh/enz_linalg
Open

Forward and reverse Enzyme tests and rules for linalg#449
kshyatt wants to merge 22 commits into
mainfrom
ksh/enz_linalg

Conversation

@kshyatt

@kshyatt kshyatt commented Jun 10, 2026

Copy link
Copy Markdown
Member

Trying to make these a little more manageable and pick up the fwd rules where possible

@kshyatt kshyatt requested review from Jutho and lkdvos June 10, 2026 13:30
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.82591% with 77 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/TensorKitEnzymeExt/linalg.jl 64.17% 48 Missing ⚠️
ext/TensorKitEnzymeExt/utility.jl 15.00% 17 Missing ⚠️
ext/TensorKitEnzymeTestUtilsExt.jl 74.35% 10 Missing ⚠️
src/auxiliary/ad.jl 85.71% 2 Missing ⚠️
Files with missing lines Coverage Δ
ext/TensorKitChainRulesCoreExt/tensoroperations.jl 88.99% <100.00%> (ø)
ext/TensorKitChainRulesCoreExt/utility.jl 100.00% <ø> (+20.00%) ⬆️
ext/TensorKitEnzymeExt/TensorKitEnzymeExt.jl 100.00% <100.00%> (ø)
ext/TensorKitMooncakeExt/indexmanipulations.jl 96.11% <100.00%> (ø)
ext/TensorKitMooncakeExt/linalg.jl 99.09% <100.00%> (-0.01%) ⬇️
ext/TensorKitMooncakeExt/tensoroperations.jl 91.66% <100.00%> (ø)
ext/TensorKitMooncakeExt/utility.jl 100.00% <100.00%> (+28.57%) ⬆️
ext/TensorKitMooncakeExt/vectorinterface.jl 100.00% <100.00%> (ø)
src/TensorKit.jl 13.79% <ø> (ø)
src/auxiliary/ad.jl 85.71% <85.71%> (ø)
... and 3 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread ext/TensorKitEnzymeExt/linalg.jl
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Your PR no longer requires formatting changes. Thank you for your contribution!

@kshyatt kshyatt marked this pull request as draft June 11, 2026 07:18
@kshyatt kshyatt marked this pull request as ready for review June 11, 2026 09:26
@kshyatt

kshyatt commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

The test on 1.12 is passing locally for me! I assume it's getting OOMed or something...

@kshyatt

kshyatt commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

OK, everything looks happy now except the GPU stuff which is unrelated. Are we good to go?

Comment thread ext/TensorKitEnzymeExt/utility.jl Outdated
Comment thread ext/TensorKitEnzymeExt/utility.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
@lkdvos

lkdvos commented Jun 16, 2026

Copy link
Copy Markdown
Member

Do we think the test failure is a problem with how LRU interacts with Enzyme?

From the stacktrace, I seem to read this as not finding a key, even though being in an if-clause that explicitly checks this: https://github.com/JuliaCollections/LRUCache.jl/blob/1dad9fef75fef51ea1b7e984e5850ad4e374a7e0/src/LRUCache.jl#L172-L175

The really confusing part to me is that it seems to originate from a forward call, which should just be a regular function call, so I'm not sure what is really going on there. I also don't think this can really be a race condition, since 1) I don't think we are multithreading, 2) LRU protects against this?

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

It also seems to only happen in the CompatCheck tests, not the main ones

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Let me just see if bumping the LRUCache compat helps at all...

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

OK that makes CompatCheck pass and the min test fail. It seems the failures only happen on 1.10 regardless, but they are intermittent. Really annoying.

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Also locally I can see it happen in reverse calls so I think it's not to do with fwd mode really

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Removing the @cached in front of the definition of degeneracystructure does seem to fix this. Maybe there's a way to fill the cache before the Enzyme tests run? I'll take a look.

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

OK so after more digging, it looks like the problem here is 1.10 + Enzyme + the @cached macro. I'm ok with disabling this set of tests on 1.10 for now while I try to work with the Enzyme people to figure out what's going on. Does that sound reasonable?

Comment thread ext/TensorKitEnzymeExt/linalg.jl
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitMooncakeExt/indexmanipulations.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
@kshyatt

kshyatt commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Some more strangeness: the missing key error only ever occurs with Vtr, I cannot trigger it with VRepU₁ or any of the other AD spaces. This makes me wonder if the @cached stuff is kind of a red herring -- I'll dig into this more today.

@lkdvos

lkdvos commented Jun 23, 2026

Copy link
Copy Markdown
Member

I would be fine disabling this for now and leaving an issue open, it's a bit silly to block ourselves for this

@kshyatt

kshyatt commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

FWIW I have noticed that if I force TensorKit to empty the global caches after each test_reverse/test_forward in test/enzyme-linalg/mul.jl, it appears to """fix""" things on 1.10. That would also be a way to keep testing this there without completely disabling it?

@kshyatt

kshyatt commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Actually on 1.10, we could just test with a slightly more "slender" domain which has only 2 complex spaces in the product space, no cache emptying needed. I think that works best and I will leave comments in the file explaining as well.

@kshyatt

kshyatt commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Anyone have more comments here?

Comment thread ext/TensorKitEnzymeExt/linalg.jl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants