Skip to content

Fix inconsistent fingerprints#90

Merged
snejus merged 3 commits intomasterfrom
inconsistent_feed_depending_on_blocksize
Apr 9, 2026
Merged

Fix inconsistent fingerprints#90
snejus merged 3 commits intomasterfrom
inconsistent_feed_depending_on_blocksize

Conversation

@semohr
Copy link
Copy Markdown
Contributor

@semohr semohr commented Jan 7, 2026

Depending on python version and hardware the io.DEFAULT_BUFFER_SIZE varies. We use this value to determine the length of blocks we ingest/feed to the chromaprint library. This can influence the generated fingerprints if the block sizes do not align with the number of max samples (which they hardly ever do).

Decision needed:
How do we want to approach this? The fix can be considered a bugfix but also breaking change. The fingerprints generated previously (using too much data from the last block) will not match the ones generated after the change. In my opinion this is a bug although fixing the bug might introduce issues for some users.

closes #89

@grawlinson
Copy link
Copy Markdown

Package maintainer for Arch Linux here, if you do not fix it now, you're just kicking the can down the road. As there are already issues popping up and there will be more as Python 3.1{3,4} become more widespread. Just my 2 cents.

@snejus
Copy link
Copy Markdown
Member

snejus commented Jan 14, 2026

How does this affect matching fingerprints that have already been uploaded to AcousticBrainz?

@semohr
Copy link
Copy Markdown
Contributor Author

semohr commented Jan 14, 2026

It shouldnt matter too much, they will be different ofc. But comparing should be done with a distance measure anyways and results should be closer to the true distribution now. We should test this tho not sure how the implementation looks on their side.

Effectively only the last few bytes changed (because that's how the algorithm works). Also only for long songs (bigger 120s).

The only issue I see is beets here: I'm not sure how and where we do fingerpint comparisons but if they are equality checks and not a distance based with a cutoff, that will introduce issues.

As a side note, the fingerprints are still different to the ones generated by the cli fpcalc tool. But the average error in my local testing seem to be lower i.e. they are closer by distance.

As another note the fingerprints will also differ slightly if you use another spectrum extraction backend. So inconsistencies are always an expected thing.

@semohr semohr force-pushed the inconsistent_feed_depending_on_blocksize branch from dc1af3b to 98bbd65 Compare April 9, 2026 11:53
semohr added 2 commits April 9, 2026 14:55
we use to ingest/feed in fingerprinting can influence the generated
fingerprints.

This fixes the issue by only consuming the expected samples.
@semohr semohr force-pushed the inconsistent_feed_depending_on_blocksize branch from 98bbd65 to 01a5af0 Compare April 9, 2026 13:18
@semohr
Copy link
Copy Markdown
Contributor Author

semohr commented Apr 9, 2026

@snejus I have just seen that this is still open.

Rebased it and added a test for that verifies the expected behavior. How do we feel about merging this now?

@semohr semohr requested a review from snejus April 9, 2026 13:25
@semohr semohr force-pushed the inconsistent_feed_depending_on_blocksize branch from e47fa5b to 15f8ae2 Compare April 9, 2026 13:29
@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 9, 2026

Would you mind testing this in beets? chroma does fingerprint comparison

@semohr
Copy link
Copy Markdown
Contributor Author

semohr commented Apr 9, 2026

We use the proper function to compare the fingerprints https://github.com/beetbox/beets/blob/4e08403df445e13a49916f3d1390dcc55ab66e5b/beetsplug/chroma.py#L329.
Should be fine as far as I can see. Not really sure how to test this in beets directly.

@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 9, 2026

I'm just concerned about the previously calculated fingerprints that had already been uploaded online. I'd like to check that a new fingerprint calculated for the same track (say 6 min long) is still matching the old one.

@semohr
Copy link
Copy Markdown
Contributor Author

semohr commented Apr 9, 2026

What do you mean by "different"? At the byte level fingerprint of course they will differ. From a fingerprint matching perspective, though, they are considered the same.

With the changes, our fingerprints are now more consistent, but they may still be different on a byte level for some users. This will happen if the block sizes didn’t align perfectly with maxlength * sample_rate * n_channels previously.

Importantly, "different" here is not a problem. The algorithm is designed to handle small differences songs are never entirely identical at the sample level, and minor variations (like a single bit flipped due to sampling or compression) are expected.

If you compute the distance/match between the pre-fix and post-fix fingerprints, it will be very small though it will almost never be exactly 0.

Example:

  • Pre-fix state: Uploading the same song A with blocksize=10 and another user uploading song B with blocksize=20 would produce fingerprints that are not equal.
  • Post-fix state: Fingerprints are now equal even when the blocksize does not aligns properly, making matching much more reliable.

@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 9, 2026

Thanks for the explanation, this makes sense to me! Good to go

@snejus snejus merged commit 3bf9636 into master Apr 9, 2026
8 checks passed
@snejus snejus deleted the inconsistent_feed_depending_on_blocksize branch April 9, 2026 16:55
@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 9, 2026

@semohr we should release it to unlock beetbox/beets#6267 right?

@semohr
Copy link
Copy Markdown
Contributor Author

semohr commented Apr 9, 2026

Go ahead. Im was just trying to rebase that PR and remove the refactor. Was done quite some time ago 😬

@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 9, 2026

See #94

snejus added a commit to beetbox/beets that referenced this pull request Apr 11, 2026
This PR adds testing targets for version 3.14, enabling us to verify
compatibility with Python 3.14.

closes #6232 

I would also like to see this addition included here as it will
introduce issues for 3.14 users if not
beetbox/pyacoustid#90

<img width="1223" height="534" alt="image"
src="https://github.com/user-attachments/assets/befa6204-2daf-4234-bf5a-247971eda23e"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent fingerprints generated in Python 3.13 and 3.14

3 participants