HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup by sanjeet006py · Pull Request #8236 · apache/hbase

sanjeet006py · 2026-05-14T08:20:54Z

Generated-by: Claude Opus 4.7

…s(startKey, limit) API for bulk meta-cache warmup

haridsv · 2026-05-14T11:08:39Z

+  public List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException {
+    // No need to page as region locations are already in-memory.
+    return getAllRegionLocations();
+  }


This is actually breaking the contract established by the javadoc by always returning all regions.

Yeah, I was also wondering same. Thanks for bringing this up. Instead of implementing paging here, shall I throw an exception saying use getAllRegionLocations as I don't see advantage of paging in here.

Sure, that works too, as long as you call out the fact that all implementation may not support it and that they should fallback to getAllRegionLocations.

haridsv · 2026-05-14T11:35:32Z

+   *         more regions exist
+   * @throws IOException if a remote or network exception occurs
+   */
+  List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException;


Since this is changing a public interface on a stable branch, this will break external code implementing the interface, why not have a default implementation perhaps throw unsupported or return a blank list?

haridsv · 2026-05-14T12:15:07Z

+   * Ordering: regions are returned in ascending region start-key order (the natural order of
+   * {@code hbase:meta} rows for a single table). Within each region, replicas are returned in
+   * ascending replica-id order (replica 0, then 1, then 2, ...). Split parents and offline regions
+   * are filtered out, which may cause a page to contain fewer than {@code limit} regions but never


Where is this filtering happening? I didn't see any test coverage either. The existing methods don't do any such filtering correct, so is this even needed?

Filtering is already implemented and happens in inside MetaTableAccessor.DefaultVisitorBase#visit(). Call chain: HRegionLocator#getRegionLocations() -> MetaTableAccessor.TableVisitorBase#visit() -> MetaTableAccessor.DefaultVisitorBase#visit().

I am reusing the filtering logic so, no test coverage needed.

haridsv · 2026-05-14T12:18:04Z

+   *         more regions exist
+   * @throws IOException if a remote or network exception occurs
+   */
+  List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException;


Also, this is overloading an existing methods that takes a data row to find its locations, which is a totally different semantic and so reusing that name for this can be very confusing. Perhaps just call this getLocations or may be you can come up with a better name.

… and add default-throws Review comments on apache#8236 by haridsv: - Adding an abstract method to RegionLocator / AsyncTableRegionLocator breaks external implementers. Convert the new method to a default that throws UnsupportedOperationException, with javadoc instructing callers to fall back to getAllRegionLocations(). - Reusing the getRegionLocations name overloads a method with completely different semantic (row -> all replicas of containing region). Rename to getRegionLocationsPage(byte[] startKey, int limit) to make the pagination/range intent explicit. - SnapshotRegionLocator no longer needs an override; it inherits the default-throws and callers fall back to getAllRegionLocations(). Tests renamed to call getRegionLocationsPage; semantic and assertions unchanged.

… and add default-throws Review comments by haridsv on apache#8236: - Adding an abstract method to RegionLocator on a stable branch breaks external implementers. Convert the new method to a default that throws UnsupportedOperationException, with javadoc instructing callers to fall back to getAllRegionLocations(). - Reusing the getRegionLocations name overloads a method with completely different semantic (row -> all replicas of containing region). Rename to getRegionLocationsPage(byte[] startKey, int limit) to make the pagination/range intent explicit. - SnapshotRegionLocator no longer needs an override; it inherits the default-throws and callers fall back to getAllRegionLocations(). Tests renamed to call getRegionLocationsPage; semantics and assertions unchanged.

Address the canvas review point that the paged meta scan was making ceil(limit / hbase.meta.scanner.caching) ScannerNext RPCs whenever the limit exceeded the configured caching, contradicting the "completes in a single RPC" javadoc on MetaTableAccessor.scanMetaForTableRegions and the "at most one RPC per invocation" claim on RegionLocator.getRegionLocationsPage. Plumb a private isPagedScan flag through MetaTableAccessor.scanMeta and getMetaScan. When true, set setCaching to the caller-supplied rowUpperLimit so the slice returns in a single ScannerNext RPC regardless of the configured caching default. All other existing scanMeta callers (scanByRegionEncodedName, getClosestRegionInfo, scanMetaForTableRegions without rowLimit) keep their existing behavior by passing isPagedScan = false.

…scans Mirrors the master-PR test (TestRegionLocatorPagedScanRpcCount on apache#8237). On the sync path we proxy: Connection.getTable(META_TABLE_NAME) -> Table.getScanner(Scan) -> ResultScanner.close() so we can flip Scan.setScanMetricsEnabled(true) on the way in and read ScanMetrics.countOfRPCcalls before the underlying scanner is closed. The RPC count is the natural sync analogue of the AdvancedScanResultConsumer.onNext invocation count we used on master. Cluster runs with hbase.meta.scanner.caching = 2 against a table of 5 user regions, asserting: - limit <= caching: 1 ScannerNext RPC (baseline) - limit > caching: 1 ScannerNext RPC (regression check; without the isPagedScan fix this would be ceil(5/2) = 3) - unbounded scan: ceil(NUM_REGIONS / caching) = 3 ScannerNext RPCs (proves the isPagedScan flag does not leak into the non-paged callers) Verified the test catches the regression: reverting the isPagedScan branch in MetaTableAccessor.getMetaScan makes testSingleRpcWhenLimitExceedsCaching fail with countOfRPCcalls = 3.

haridsv · 2026-05-15T09:55:06Z

+ * Cluster runs with {@code hbase.meta.scanner.caching = 2} so the {@code limit > caching} branch is
+ * exercised cheaply with a small table (5 user regions).
+ */
+@Category({ MediumTests.class, ClientTests.class })


Considering that the purpose is to verify that the scanner caching value is correct, I feel a mini cluster test is unnecessary and adds too much time on the test duration. Please check if you can achieve this using a mock.

I tried to find a way to do this with a mock and got a bit stuck — wanted to share where I landed.

The property the test is asserting is "exactly one ScannerNext RPC fires for a paged scan, regardless of hbase.meta.scanner.caching." That isn't really a property of MetaTableAccessor in isolation — it's an emergent property of ClientScanner + ScannerCallable + the RegionServer responding to the scan's caching/limit, and ScanMetrics.countOfRPCcalls is incremented inside that machinery on real responses.

Two mock-based shapes I considered, both of which seem to weaken the test:

Capture the Scan and assert scan.getCaching() == rowLimit. This restates the one-line fix in getMetaScan and wouldn't catch a future regression where caching is set correctly but the client-side batching contract changes.

Hand-roll a fake ResultScanner that simulates ScannerNext batching from caching/limit. That re-implements the contract we're verifying inside the test fake and asserts our fake matches our expectation — feels tautological.

If you see a shape I missed that preserves the RPC-count assertion without a cluster, I'd be happy to switch. Otherwise I'd lean toward keeping this as an integration test.

Umeshkumar9414 · 2026-05-15T12:35:43Z

+   * Release {@link #userRegionLock} previously acquired via {@link #takeUserRegionLock()} and
+   * record the held duration in metrics.
+   * @param lockStartTimeMs value of {@link EnvironmentEdgeManager#currentTime()} captured
+   *                        immediately after {@link #takeUserRegionLock()} returned


wonder how spotless didn't take care of spaces.

I think this is valid space as we are listing method parameters and so, spotless also didn't flag it. On viewing in editor it is aligned rightly.

Sanjeet Malhotra added 2 commits May 14, 2026 13:49

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocation…

e62bee5

…s(startKey, limit) API for bulk meta-cache warmup

spotless:apply

3120495

haridsv reviewed May 14, 2026

View reviewed changes

Sanjeet Malhotra added 3 commits May 15, 2026 01:57

sanjeet006py requested a review from haridsv May 15, 2026 05:45

haridsv reviewed May 15, 2026

View reviewed changes

Umeshkumar9414 reviewed May 15, 2026

View reviewed changes

sanjeet006py requested review from Umeshkumar9414 and haridsv May 15, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8236

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8236
sanjeet006py wants to merge 5 commits into
apache:branch-2from
sanjeet006py:HBASE-30161

sanjeet006py commented May 14, 2026

Uh oh!

haridsv May 14, 2026

Uh oh!

sanjeet006py May 14, 2026

Uh oh!

haridsv May 14, 2026

Uh oh!

haridsv May 14, 2026

Uh oh!

haridsv May 14, 2026

Uh oh!

sanjeet006py May 14, 2026

Uh oh!

sanjeet006py May 14, 2026 •

edited

Loading

Uh oh!

haridsv May 14, 2026

Uh oh!

haridsv May 15, 2026

Uh oh!

sanjeet006py May 15, 2026

Uh oh!

Umeshkumar9414 May 15, 2026

Uh oh!

sanjeet006py May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sanjeet006py commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanjeet006py May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sanjeet006py May 14, 2026 •

edited

Loading