Skip to content

ASB-30568: Adding read_product Function to load files from s3 to memory#3561

Open
AlexReedy wants to merge 19 commits into
astropy:mainfrom
AlexReedy:ASB-30568_read-product-function
Open

ASB-30568: Adding read_product Function to load files from s3 to memory#3561
AlexReedy wants to merge 19 commits into
astropy:mainfrom
AlexReedy:ASB-30568_read-product-function

Conversation

@AlexReedy
Copy link
Copy Markdown

@AlexReedy AlexReedy commented Mar 20, 2026

Adding in ability to read FITS and ASDF data products to memory from s3:// using Observations.read_product() function

@bsipocz bsipocz added the mast label Apr 4, 2026
@snbianco snbianco self-requested a review April 9, 2026 14:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.23%. Comparing base (a0ec925) to head (ac66e84).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/mast/observations.py 75.00% 9 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3561   +/-   ##
=======================================
  Coverage   73.23%   73.23%           
=======================================
  Files         226      226           
  Lines       21010    21046   +36     
=======================================
+ Hits        15386    15413   +27     
- Misses       5624     5633    +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AlexReedy AlexReedy marked this pull request as ready for review April 17, 2026 16:57
Copy link
Copy Markdown
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit history is a bit all over the place. Could you please clean it up and squash to logical chucks rather than quasi random back and forth? Thanks!

I suppose this will also need narrative documentation; but I'll leave the more detailed review to @snbianco

Comment thread astroquery/mast/observations.py Outdated
except Exception as e:
log.exception(f"Failed to open ASD File: {product_path} {e}")
else:
print("Unsupported extension type")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't print anything; either raise proper warning or error classes or put this behind a verbose option.

Copy link
Copy Markdown
Author

@AlexReedy AlexReedy Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you! I thought I got rid of all of those

Comment thread astroquery/mast/observations.py Outdated
'`~astroquery.mast.ObservationsClass.enable_cloud_dataset` method.'
)

asdf_packages = ["asdf", "s3fs", "fsspec", "lz4", "gwcs"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does all of these really need to be checked here even as they are not directly been used? If asdf requires the whole list then these checks should be dealt with upstream in asdf itself.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been an ongoing question with the implementation. Required would be asdf, s3fs, fsspec for the function to work. for asdf specifically, lz4 seems to be the primary compression algorithm being used for asdf and gwcs is for the general.

I know that gwcs is in their test environment, and they also make calls to lz4 but I don't believe it's included when it's installed. I agree it should probably be upstream in asdf.

@AlexReedy AlexReedy force-pushed the ASB-30568_read-product-function branch from 12ab349 to eb5a1b2 Compare April 28, 2026 17:08
@snbianco
Copy link
Copy Markdown
Contributor

Thanks again for this PR, Alex! Can you add a quick section about this function to the Observations docs (docs/mast/mast_obsquery.rst). You'll probably want to put it in the cloud data access section.

This PR will also need some tests. The non-remote-access tests may be slightly tricky with mocking. Let me look into it and see if I can point you in the right direction. The remote-access tests should be more straightforward.

Copy link
Copy Markdown
Contributor

@snbianco snbianco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of review! This is looking good. A lot of my comments were questions or notes on the docs/docstrings. Let me know if you have questions or want to chat more about the switch to fsspec!

Comment thread astroquery/mast/tests/test_mast.py Outdated
assert Observations._cloud_enabled_explicitly is False


@pytest.fixture
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention we have for fixtures in this file is to put them near the top, before any of the tests. For this one and the s3_asdf_path specifically, though, I'm not sure if we need a fixture, since they're only being used once. I'd rather see the value of s3_fits_path be parametrized.

Comment thread astroquery/mast/tests/test_mast.py Outdated


def test_read_product_fits(s3_fits_path, mock_fits_open, mocker):
mocker.patch("astropy.__version__", "5.0.0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this line?

def test_observations_read_product_asdf(self):
asdf = pytest.importorskip("asdf")

product_path = "s3://stpubdata/roman/nexus/soc_simulations/tutorial_data" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any smaller files that we have available to us from the Nexus? This takes around 15 seconds to load in, and we try to keep this test suite as short in duration as we can.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check again but looking at the stuff that's on s3 the tutorial data is the smallest I saw at 4.0KB

Comment thread astroquery/mast/observations.py Outdated
Parameters
----------
product_path: str
URI to the product in open bucket.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
URI to the product in open bucket.
URI to the product in the STScI S3 open data bucket.

Comment thread astroquery/mast/observations.py Outdated
product_path: str
URI to the product in open bucket.
read_as: str, optional
How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto".
How to read the file. Currently only FITS and ASDF file types are supported by "auto". Default is "auto".

Comment thread docs/mast/mast_obsquery.rst Outdated
Comment thread docs/mast/mast_obsquery.rst Outdated

Streaming Data Products from S3 to memory
-----------------------------------------
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
If instead of downloading you would like to load an S3 URI directly to memory, you can use the `~astroquery.mast.ObservationsClass.read_product` method.

Comment thread docs/mast/mast_obsquery.rst Outdated
Comment thread CHANGES.rst Outdated
Comment thread pyproject.toml Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"fsspec[http,s3]",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants