Skip to content

Added Support for Fetch Modes#4

Open
iliaal wants to merge 149 commits into
SeasX:masterfrom
iliaal:master
Open

Added Support for Fetch Modes#4
iliaal wants to merge 149 commits into
SeasX:masterfrom
iliaal:master

Conversation

@iliaal

@iliaal iliaal commented Dec 15, 2019

Copy link
Copy Markdown

Couple of feature enhancements to simplify data retrieval

  1. Added ability to fetch dates as strings (date as Y-m-d and datetime as Y-m-d H:i:s similar to how it is returned by HTTP client) via DATE_AS_STRINGS mode
  2. Added ability to fetch single value via FETCH_ONE mode
  3. Added ability to retrieve all volumes from single column as an basic array of values via FETCH_COLUMN mode
  4. Added ability to fetch results as an associated array (key-value-pair) via FETCH_KEY_PAIR mode using col1 as index and col2 as value.

Also added SeasClickException class so that exceptions thrown are specific to extension as opposed to using generic Exception class.

@wujunze wujunze requested review from Neeke and aiwhj December 16, 2019 04:32

@769344359 769344359 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove shutdown callback ?

Comment thread SeasClick.cpp
SEASCLICK_RES_NAME,
SeasClick_functions,
PHP_MINIT(SeasClick),
PHP_MSHUTDOWN(SeasClick),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove shutdown callback?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't do anything, so recommended approach is to not invlude PHP_MSHUT / PHP_MINIT unless they do something

Comment thread SeasClick.cpp Outdated
};

#define REGISTER_SC_CLASS_CONST_LONG(const_name, value) \
zend_declare_class_constant_long(SeasClick_ce, const_name, sizeof(const_name)-1, (zend_long)value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do some compatibility tests, such as zend_long type does not support PHP 5 version

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread SeasClick.cpp Outdated
convertToZval(col2, block[1], row, "", 0, fetch_mode|SC_FETCH_ONE);

if (Z_TYPE_P(col1) == IS_LONG) {
zend_hash_index_update(Z_ARRVAL_P(return_value), Z_LVAL_P(col1), col2);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility

#define zend_hash_index_update(ht, h, pData, nDataSize, pDest) _zend_hash_index_update_or_next_insert(ht, h, pData, nDataSize, pDest, HASH_UPDATE ZEND_FILE_LINE_CC)

This is the definition of PHP 5 version

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@iliaal iliaal requested a review from 769344359 April 6, 2020 12:02
@iliaal iliaal requested a review from aiwhj April 6, 2020 12:47
@guba-odudkin

Copy link
Copy Markdown

@769344359 @Neeke Any chance to get it merged?

@Rock-520

Copy link
Copy Markdown
Member

@769344359 @Neeke Any chance to get it merged?

it will have been merged when i finsh unit tests!

@Rock-520

Rock-520 commented Mar 7, 2023

Copy link
Copy Markdown
Member

image

@guba-odudkin could you please help me to fix the conflicts ? it change a lot of file, i can not merge it .

Resyncs this PR with SeasX/SeasClick master after five years of drift.
Brings in upstream's PHP 8 support, the writeStart/write/writeEnd
block-insert methods, the GCC 10 compile fix, and the package.xml
bumps. Keeps the PR's fetch-mode features (FETCH_ONE, FETCH_KEY_PAIR,
FETCH_COLUMN, DATE_AS_STRINGS), the SeasClickException class, and the
timeout/retry properties.

Conflict resolutions:

- SeasClick.cpp: merged the fetch-mode select() with upstream's write*
  block-insert methods. Standardized arg-info names on SeasClick_*
  (upstream's master shipped a SeasCilck_ typo), and routed every
  catch through sc_zend_throw_exception_tsrmls_cc against
  SeasClickException_ce so PHP 7 and PHP 8 builds both link.
- client.cpp: kept upstream's split InsertData / InsertDataEnd API.
  The PR's PHP-side writeStart/write/writeEnd already calls them in
  that order, so the combined InsertData from the lib upgrade was
  wrong.
- php7_wrapper.h: changed the ZEND_ACC_DTOR fallback from 0x4000 to 0.
  PHP 8 reused bit 14 for ZEND_ACC_VARIADIC, and the old fallback was
  crashing zend_register_functions on extension load.
- nullable.cpp: dropped the duplicate <stdexcept> include the
  auto-merge introduced.

Verified against PHP 7.1, 7.2, 7.4, 8.3 (debug+ASAN), 8.4 (debug+ASAN
and release), and 8.5 (debug+ASAN). Extension loads, all nine methods
and four fetch-mode constants register, and SeasClickException extends
Exception.
@iliaal

iliaal commented Apr 25, 2026

Copy link
Copy Markdown
Author

@Rock-520 Not sure if this repo is still alive, but PR updated and merges cleanly

iliaal added 2 commits April 25, 2026 11:56
The PR's package.xml has declared <min>7.0.0</min> for years and the
upstream code path stopped compiling on PHP 5 several releases ago.
Carrying the dual-PHP scaffolding only obscures the real PHP 7 vs
PHP 8 distinctions, so strip it.

- php7_wrapper.h: deleted the entire PHP_MAJOR_VERSION < 7 branch
  (PHP 5 alternates for hash/zval/exception calls). Removed TSRMLS_CC
  from PHP 7 wrapper bodies (the macro is empty on PHP 7+ and absent
  on PHP 8). Removed the ZEND_ACC_DTOR fallback now that the only
  callsite is gone.
- SeasClick.cpp: collapsed the PHP_VERSION_ID >= 70000 / <= 70000 /
  < 70000 conditionals around class registration, property
  declarations, and write() argument fetching. Stripped TSRMLS_CC
  from every zend_parse_parameters call. Dropped ZEND_ACC_DTOR from
  the __destruct method entry; PHP 7+ recognizes destructors by name.

Net: -155 lines.

Verified against PHP 7.4 (Docker), PHP 8.3 (debug+ASAN), and PHP 8.4
(release). Extension loads, all nine methods and four constants
register, SeasClickException extends Exception.
Replaces the artpaul-fork v1.x vendored library with the official
ClickHouse/clickhouse-cpp v2.6.1. Brings in DateTime64, LowCardinality,
Map, Geo, Time and Time64 column types from upstream, modern
ClientOptions, optional ZSTD compression on the wire, and a current
contrib (cityhash, lz4, abseil-int128, zstd).

PHP-side surface changes:

- DateTime64(N) now accepts epoch-seconds (int) or "Y-m-d H:i:s"
  strings on insert, and reads back either as int or as "Y-m-d
  H:i:s.ffffff" when DATE_AS_STRINGS is set.
- LowCardinality(String) and LowCardinality(FixedString(N)) round-trip
  through the dictionary column. Read returns plain strings.
- Map(K, V) columns can be created via factory; per-row read/write
  through the new ColumnMapT API is left as TODO.

Library / build mechanics:

- Bumped to C++17 (clickhouse-cpp v2 requires it).
- Vendored contrib/zstd in full (compress, decompress, common). zstd
  is a hard dep of the new compressed.cpp -- there is no compile-time
  gate. Built with -DZSTD_DISABLE_ASM since we don't compile the .S
  optimization file.
- Reworked config.m4 source list (~80 sources now) and added include
  paths for the new contrib trees.
- writeStart/write/writeEnd were rewired from the v1.x
  InsertQuery/InsertData/InsertDataEnd API to v2.x
  BeginInsert/SendInsertBlock/EndInsert.
- ClientOptions field renames: SetSocketReceiveTimeout ->
  SetConnectionRecvTimeout, SetSocketConnectTimeout ->
  SetConnectionConnectTimeout.
- Fixed two ZPP / arg-info mismatches PHP 8.3 caught: select() and
  execute() now declare their first argument as the only required
  one. The previous values were inherited from the original PR and
  worked on PHP 8.4 by accident.

Build artifacts now ignored via .gitignore (Makefile, configure,
config.*, autom4te.cache/, build/, modules/, *.dep, *.loT, etc.).

Verified end-to-end against ClickHouse 24.8.14 (docker
clickhouse/clickhouse-server) on PHP 7.4 (docker), PHP 8.3
(debug+ASAN), and PHP 8.4 (release): connect, ping, execute, insert,
select with FETCH_ONE / FETCH_COLUMN / FETCH_KEY_PAIR / DATE_AS_STRINGS,
writeStart/write/writeEnd block insert, DateTime64(3)/(6) round-trip,
LowCardinality(String) round-trip with correct cardinality.

PHP 8.3 ASAN reports indirect leaks (~17KB) inside Client::Impl
construction (NonSecureSocketFactory and friends) that are vendored-
library-internal and only surface at process exit. Functional path is
clean -- no SEGV, no use-after-free.

README updated: dependency list now reflects v2.x types, install
section mentions the C++17 / vendored-contrib model, and added a
"Testing against a local ClickHouse server" section pointing at the
clickhouse/clickhouse-server image with a one-liner docker run.
iliaal and others added 30 commits May 11, 2026 09:34
…t_into wedges on callback throw

Two regressions surfaced in the post-feature review pass.

SS-001 — insertFromStream TSV parser.

InsertStreamParser::feed peeked the byte after a `\` with
`i + 1 < len`. When a 64 KiB read ended exactly on a `\`, the
condition was false, the lone `\` was pushed as a literal, and the
next chunk's first byte was treated as ordinary content. A `\t`
sequence straddling the boundary parsed as the two-char literal "\t"
instead of TAB — silent data corruption for any TSV import where the
chunk slice happened to fall on an escape.

Fix: carry the dangling `\` across feed() calls via a new
pending_backslash flag (same role prev_was_cr plays for CRLF). The
escape's second byte is decoded at the top of the next feed() through
a shared decode_escape_byte lambda, so the inline path and the
cross-chunk path stay in sync. finish() treats a lone trailing `\` at
EOF as a literal terminating backslash.

SS-002 — do_select_into recovery on OnData throw.

selectToStream already wraps client->Select() with ResetConnection on
throw because OnData lambdas that throw (unsupported column type,
short stream write, etc.) leave the native client with unread blocks
on the wire. do_select_into has the same OnData pattern but no
recovery — KEY_PAIR mode against a single-column SELECT throws
"Key pair mode requires at least 2 columns" from inside the callback
and wedges the client. The next query on the same handle then gets
the residual data and looks corrupted. selectWithExternalData, also
shipped in this batch, inherited the gap because it routes through
do_select_into.

Fix: wrap the dispatch (both the plain Select and SelectWithExternalData
calls) in a try/catch that ResetConnection()s before rethrow, matching
the recovery selectToStream uses.

Regression test 107.phpt covers both: a 65 538-byte TSV payload sized
so the escape's `\` lands at offset 65 535 (the chunk boundary), then
a FETCH_KEY_PAIR call against a single-column result followed by a
plain count() to verify the handle stays usable. Also exercises the
same recovery via selectWithExternalData.

Pre-fix output of the regression test:
  len=65540 tab_pos=0 around=\tafte          (escape parsed as literal)
  rowcount after throw: 10                   (got residual id value)
  rowcount after ext throw:                  (got empty / wedged worse)
…ze insertFromStream parser containers

Three small constant-factor wins surfaced in the review pass.

appendCellForStream() called zval_get_string() on every non-NULL cell,
which always allocates a fresh zend_string even when the input zval is
already IS_STRING. convertToZval() with SC_FETCH_DATE_AS_STRINGS hands
back IS_STRING for String / FixedString / Date* / DateTime* / Decimal*
/ Int128 / UInt128 / UUID / IPv4 / IPv6, so the round-trip was a pure
allocator round-trip per cell for half the type table. Fast-path the
IS_STRING case to read Z_STRVAL_P / Z_STRLEN_P directly; numerics
still go through zval_get_string for the snprintf conversion.

InsertStreamParser::cell_buf and row_cells started at zero / SSO
capacity and reallocated on the first row of every import. cell_buf's
SSO is ~15 bytes on common libstdc++ ABIs, so any UUID, URL, or
timestamp cell triggered a heap allocation; row_cells (std::vector<zval>)
went through up to six geometric reallocations before reaching
columns_count. After the first row both stabilize because clear()
preserves capacity, but the first row paid the full cost. Reserve
both at parser setup so the first row is paid for like every
subsequent one.

Smoke: 200k-row TSV ingest into a Memory table runs in ~295 ms post-
patch (~677k rows/s). Pre-patch comparable bench (100k in ~139 ms)
extrapolates similarly, so the wins here are correctness-of-cost
adjustments (no first-row reallocation tax, no per-cell allocator
round-trip for the string types) more than a measurable end-to-end
speedup; the headline number is dominated by ClickHouse insertion
throughput, not the parser.
…amFormat enum

selectToStream() and insertFromStream() each defined their own
structurally-identical 4-value format enum plus its own parse helper
and is-CSV / has-header predicates — ~35 lines of code that diverged
on enum name only. Adding a format alias (e.g. another short name)
required two synchronized edits.

Collapse to a single StreamFormat enum, one parseStreamFormat, one
streamFormatIsCSV, and one streamFormatHasHeader. The four wire
formats apply identically to both input parsing and output formatting;
the input and output sides of TSV/CSV are intentionally symmetric so
selectToStream output round-trips through insertFromStream.

No behavioral change; surfaced by the post-feature review pass.
Surfaced in the review as SS-003. The parser accepted `\N` at cell
start, pushed the literal two-byte sequence into cell_buf, and let
parsing continue. If anything but a cell separator or row terminator
followed, the trailing bytes were appended to cell_buf and pushCell's
bytes-based NULL check (cell_buf.size() == 2 && [0]=='\\' && [1]=='N')
no longer matched. Result: `\Nx` silently became the literal three-
char IS_STRING `"\Nx"` for String columns, or surfaced as a numeric
parse error one layer down for numeric columns — never as the
"`\N` is reserved as the whole-cell NULL marker" error ClickHouse's
own TSV reader emits.

Fix: track the NULL-marker state explicitly via a new cell_is_null
flag, set when `\N` is decoded at cell start (cell_buf empty). After
the cell_sep and row_term checks in feed(), if cell_is_null is still
true the next byte is content the user didn't mean to send — throw
"TSV `\\N` is the whole-cell NULL marker and cannot be followed by
other data". finishCell resets the flag.

CSV stays bytes-based and permissive: the docs say `\N` is the NULL
marker but CSV has no escape protocol, so `\Nx` in CSV remains the
literal 3-char string per the documented behavior. Test 108 pins both
sides of that — TSV strict, CSV permissive — plus the legal cases
(`\N` followed by tab, `\N` followed by newline, `\\N` decoded as
content, handle remains usable after the throws).

Pre-fix output of the regression test:
  trailing-after-N: NO THROW            (was silently landed as `\Nx`)
  nullable-after-N: NO THROW            (was silently landed as `\Ntrash`)
  total rows: 6                         (vs expected 4 — two bad rows snuck in)
…cell, permissive post-quote bytes

Three parser-correctness bugs flagged in a second-opinion review pass
on top of the original review. All three were silent data-integrity
issues, all repro'd locally against the previous tip.

CR-001 (Important, 0.94): NULL into non-Nullable column.

The CHANGELOG and README said "`\N` is rejected unless the target
column is Nullable". The implementation never enforced it. pushCell()
emitted ZVAL_NULL correctly, then insertColumn()'s String handler
(typesToPhp.cpp:1142) ZStrGuard'd the IS_NULL through zval_get_string,
which coerces to an empty zend_string — silently inserting "" for
String columns. Numeric columns likewise coerce IS_NULL through their
own paths. Pre-fix repro: TSV `\N` into a String column inserted one
row with length(s)=0.

Fix: a new acceptsNullCell() helper walks Nullable / LowCardinality
wrappers and answers whether the column type can hold a NULL. The
insertFromStream() on_row callback consults blockQuery (the schema
from BeginInsert) for each cell; an IS_NULL targeting a column that
does not accept NULL now throws the documented error before any byte
crosses the wire.

CR-002 (Important, 0.88): quoted-empty cell at EOF gets dropped.

A CSV file containing exactly `""` (open quote, close quote, no
trailing newline) inserted zero rows. The QuotePending->InCell
transition never pushed anything into cell_buf or row_cells, and
finish()'s flush condition `!cell_buf.empty() || !row_cells.empty()`
silently discarded the row. Pre-fix repro: insertFromStream() with
payload `""` and column ["s"] returns inserted=0.

Fix: add cell_is_quoted to the flush condition. The flag is true
whenever a cell was started by an opening quote and not yet finalized,
so it covers exactly the missing case without affecting empty-input.

CR-003 (Medium, 0.82): bytes after closing quote silently accepted.

The QuotePending branch fell through to ordinary cell handling on
any non-quote byte, so `"ab"c` parsed as `abc` and a stray space
after a closing quote was eaten. RFC 4180 and ClickHouse's own CSV
reader reject this — silent acceptance hides upstream export bugs.
Pre-fix repro: insertFromStream() with `"ab"c\n` inserted one row
with s="abc".

Fix: after a closing quote, accept only the cell separator, `\n`,
`\r`, or `""` (escape). Anything else throws.

Bonus fix on pushCell(): the bytes-based NULL check at finishCell
time was ambiguous between `\N` (TSV escape decoded to the NULL
marker) and `\\N` (TSV escape decoded to literal `\N` byte sequence).
The previous behavior treated both as NULL. Now pushCell takes the
parser's cell_is_null flag, so TSV `\\N` round-trips as the two-
character IS_STRING value while `\N` is NULL. CSV stays bytes-based
because CSV has no escape protocol.

Test 109 covers all three CRs (REJECTED probes for cr001-string,
cr001-numeric, cr003-bytes-after-quote, cr003-space-after-quote;
positive probes for cr002-quoted-empty, cr002-quoted-value,
cr003-doubled-quote, cr001-nullable-ok), plus a handle-still-usable
sanity. Test 108 updated to reflect the new pushCell semantics
(`\\N` is now a 2-char IS_STRING, not NULL).
…ect empty external tables upfront

Two more findings from a third review pass on the v0.8.1 stack.

CR-001 (Important, 0.93): stream read failure treated as EOF, partial
data committed.

insertFromStream's read loop was `while (!php_stream_eof(stream))`
followed by `if (n <= 0) break;`. A wrapper returning -1 (error) or 0
without setting EOF (transient unavailability) hit the break and
control flowed into parser.finish() + flush_batch() + EndInsert(),
committing whatever had been parsed up to the failure. Pre-fix repro:
userland stream wrapper that emits a warning and returns false on the
second read; insertFromStream returns rows=1 and the table contains
the partial first row ("row" — the first three bytes of "row-one"
before the failure interrupted parsing).

Fix: distinguish error from EOF. Throw on negative return; throw on
zero return when !php_stream_eof. The existing catch path's
"block_dirty || total_rows > 0 -> ResetConnection" branch then
discards the partial data instead of EndInsert()-committing it.

CR-002 (Important, 0.90): empty external-table rows skipped by lib,
server gives misleading error.

clickhouse-cpp deliberately skips zero-row external blocks
(lib/clickhouse-cpp/clickhouse/client.cpp:354) because the native
protocol uses an empty block as the end-of-stream marker for the
external-data section. Sending a real zero-row named block would
prematurely close the stream. Pre-fix, `selectWithExternalData` with
`rows => []` slipped through validation; the lib omitted the named
table entirely; the server then failed with the misleading "Unknown
expression or table expression identifier ext_..." error.

Fix: reject upfront in buildExternalTableBlock with a message that
points to the userland workaround (skip the query when the filter
set is empty). The protocol semantics make a "patch the lib to send
the empty block" alternative non-viable.

Test 110 covers both: stream-failure repro (custom userland wrapper)
asserts REJECTED with zero rows committed, empty-externals repro
distinguishes the new client-side rejection ("has no rows") from
the pre-fix server-side path ("Unknown expression").

License note for this repo: php_clickhouse stays on PHP 3.01 (carried
over from SeasClick original code) while the rest of the fleet
relicensed to BSD 3-Clause. New code here continues to use PHP 3.01.
…eam / insertFromStream args

PHP 7.4's php_stream_from_zval_no_verify emits an E_WARNING ("supplied
argument is not a valid stream resource") before returning NULL; PHP
8.x is silent on the same input. Tests 102 and 106 each have a
"not-a-stream" probe that passes the literal string "not a stream" to
exercise the argument-validation rejection — the throw itself is
identical on both runtimes, but the pre-throw warning text broke the
PHPT EXPECT diff on PHP 7.4 only.

Add @-suppression on the two probe arrows. Behavior is unchanged on
PHP 8.x. CI's PHP 7.4 lane goes back to green.

No production-code change. Other probes in 102/106 pass valid
php://memory resources and never tripped the warning, so they stay
as-is.
The test uses `[1.5 => "a", 0.1 => "b"]` which relies on PHP's
pre-8.1 silent float→int coercion of array keys. PHP 8.1+ added a
deprecation notice for that coercion, which surfaces in test stdout
and breaks the EXPECTF diff. The original EXPECTF (`key 0: 0.1` /
`key 1: 1.5`) never matched the coerced-to-int reality either, so the
test has been broken since PHP 8.1; it skipped silently on CI runners
that lack de_DE locale, but fails on any dev box with de_DE
installed.

Proper fix is to insert float keys via raw SQL `map(0.1, 'b', 1.5,
'a')` so PHP array semantics aren't in the picture, then update the
EXPECTF to whatever php_gcvt's default-precision Float64 formatting
emits (likely the full 0.10000000000000001 representation, not the
truncated 0.1). That's a separate design call about the locale-
independence guarantee's exact precision contract.

For now: SKIPIF on PHP_VERSION_ID >= 80100 with a clear message
naming the rewrite as the unblock. Unblocks the 0.8.5 release local
matrix; CI behavior unchanged (already skipped via locale absence).
Reject nulls in non-Nullable string and UUID insert paths, validate UUID hex input fully, reset streaming selects after callback exceptions, and preserve existing callbacks when setter validation fails.

Add PHPT coverage for strict string/UUID inserts, stream callback recovery, and invalid callback setters.
php:X.Y-cli Debian images don't ship /usr/bin/unzip, which composer
needs to extract PIE's prebuilt-binary zip. Without it, composer
silently falls back to PHP's ZipArchive (different on-disk layout)
and PIE's prePackagedBinary check trips ExtensionBinaryNotFound even
though the zip downloaded fine. Heads-up callout placed right under
the pie install command.

Trap captured at ~/ai/wiki/debugging/php-ext-release-traps.md.
PHP 8 raises ArgumentCountError when a userland call passes extra args
to a strict-arginfo internal method; PHP 7.4 only emits a non-fatal
E_WARNING and returns NULL, so the catch (Throwable) arm never fires
and the test prints "uptime extra: NO THROW" instead of the expected
"REJECTED".

Wrap that one assertion with a set_error_handler that promotes the
warning to a thrown RuntimeException for the duration of the call.
The 8+ ArgumentCountError path still lands in the same catch arm —
both versions reach REJECTED via Throwable. restore_error_handler()
keeps the rest of the test running under the engine's default
reporting.

Caught on the PHP 7.4 lane of CI run 26231330832 (Tests workflow on
54c8f7f); other lanes (8.1 through 8.5 + ASAN + ZTS + Windows + PIE
smoke) were green throughout.
dirname resolves to php-src dir in in-tree builds, move PHP_NEW_EXTENSION up and use $ext_srcdir instead
clone on ClickHouse and ClickHouseRowIterator fell through to the std
clone handler, which allocates a bare zend_object with no room for the
C++ prefix; any method call or destruction on the clone read or freed
out of bounds. Both handler sets now NULL clone_obj the way
ClickHouseStatement already did. ClickHouseStatement iteration moved
the internal pointer of the rows array that toArray()/jsonSerialize()
share with userland, tripping the zend_hash debug assertion and moving
the cursor of the caller's copy; iteration now uses a per-object
HashPosition. Row, cell, and column-name zvals are ZVAL_DEREF'd before
type checks so values left as references by foreach-by-ref are no
longer rejected by insert(), insertAssoc(), write(), and
insertFromStream(). write()'s catch block is additionally gated on
having acquired the wire (same shape as writeEnd's end_insert_started)
so a reentry throw cannot reset a connection an outer operation is
still reading; unreachable with the vendored client today because
BeginInsert detaches query events on return, kept as hardening.
Tests 120-122 fail on the unpatched build (ASan SEGV, zend_hash
assertion abort, exception on valid rows) and pass with the fixes.
…r, 128-bit overflow

Second codesage review round (run_20260610T014346Z) over all 68 method
slices surfaced 21 distinct root causes; this fixes them.

Memory safety:
- Add a get_gc handler exposing the progress/profile/verbose callback
  zvals. A closure that captures the client formed an uncollectable
  cycle (object -> closure -> object), leaking the object and its socket
  until request shutdown.
- Pin the callable across call_user_function in all three callback
  invocation sites. An array-callable that unregisters itself mid-call
  (setProgressCallback(null)) freed the executing handler ($this).
- selectToStream re-resolves the destination stream on every block flush
  (silent zend_fetch_resource2_ex) instead of caching php_stream*; a
  callback that fclose()s it now raises a clean exception rather than
  writing through a dangling pointer.
- Exception getters read into a caller-owned rv and ZVAL_DEREF the slot,
  so a subclass routing the property through __get can't return a
  dangling stack pointer.
- do_insert_into addref-pins the column-name strings for the call.
- runHelperSelectFirstRow frees its rows zval on the exception path.
- Vendored ResetConnection clears inserting_ before reconnect so a failed
  reconnect can't let ~Impl auto-commit a partial streaming insert
  (LOCAL_PATCHES.md).

Correctness:
- setDatabase rebuilds the Client with the new default database instead
  of a session USE, so the switch survives reconnects (including the
  native client's internal ping-before-query retry, invisible to the
  extension).
- fetchOne uses the positional row count, so a multi-column result with
  duplicate column names returns the row instead of a scalar.
- fetchKeyPair and select(FETCH_KEY_PAIR) reject a composite key column
  instead of collapsing every row onto "Array".
- Int128/UInt128 string inserts check overflow before the multiply,
  rejecting out-of-range values instead of wrapping.
- insertFromStream flushes deferred blank rows before the next row's
  separator (interior blank line now errors), rejects a blank line
  before a *WithNames header, and folds ClickHouse TabSeparated escapes
  (\', \b, \f, unknown \c -> c).
- selectWithExternalData derefs by-ref external-table entries.
- Per-call settings reject empty/non-string keys (an empty key desynced
  the native settings section).
- Malformed endpoints config throws instead of connecting to localhost.
- sanitizeError strips the trailing SQL fragment case-insensitively so a
  bound literal in ClickHouse 26.x's lowercase "while executing" suffix
  doesn't leak.
- getServerInfo/getCurrentEndpoint/getStatistics/getLogQueries/ping/
  resetConnection enforce zero-argument arity.

Tests 123-136 cover every reproducible fix; each fails on the prior
build and passes here. Full suite 133/133 (2 pre-existing skips).
Verified each finding from review run_20260610T123312Z against the
unpatched build, then fixed and covered each with a red/green test.

User-visible bugs:
- insert(): a row passed by reference and reassigned to a non-array
  mid-conversion (e.g. by a cell's __toString) crashed in
  buildSingleColumnZval; the per-row build now rechecks IS_ARRAY and
  throws. Test 137 (ASan SEGV pre-fix).
- insertFromStream(): a trailing blank line terminated by CRLF
  materialized a spurious zero-column row; the deferred-blank flush now
  skips the CRLF tail. Test 138.
- Typed placeholder values passed by reference were not dereferenced: a
  by-ref null formatted as an empty string and a by-ref array was
  stringified. Test 141.
- Placeholder arrays now reject an empty-string key up front. Test 139.
- endpoints config no longer prepends a phantom 127.0.0.1:port endpoint
  (from the default host property) when host is unset. Test 140.
- write([]) is a no-op that leaves the open insert intact instead of
  throwing and resetting the connection after a prior write() sent
  blocks. Test 142.
- selectStreamCallback() emits select_start/data_block/select_finish
  verbose events like select(). Test 143.
- DateTime64/Time64 render negative values with floor semantics:
  DateTime64(-0.5) is 1969-12-31 23:59:59.5 and Time64(-0.5) keeps its
  sign. Test 144.
- 133.phpt rewritten to pass on the PHP 7.4 CI lane (warning, not
  ArgumentCountError).

Hardening (no red test; fault-injection paths):
- Error sanitization cuts at the earliest SQL marker, not the first
  checked, closing a between-markers literal leak.
- A throwing empty-insert EndInsert() on the recovery path now falls
  back to a connection reset instead of wedging the client in inserting
  state.

Full suite: 142 passed, 0 failed, 2 skipped (TLS).
The get_gc handler added in b403777 used the zend_get_gc_buffer API and
the zend_object* handler signature, both PHP 8.0+, so the 7.4 CI lane
failed to compile (composer floor is >=7.4). 8.1 through 8.5, ASAN, ZTS,
Windows, and PIE were unaffected.

Provide a 7.4 path with the old zval* signature that fills a per-object
buffer (three slots for the progress/profile/verbose callbacks) and
points *table at it, matching the 8.x buffer semantics. Also drop the
now-unused port_configured tracking left by the endpoints host-clear fix,
which tripped -Wall on 7.4.

Verified: clean compile on PHP 7.4 and 8.4; full suite 142 passed,
0 failed, 2 skipped on 8.4.
… passes

ClickHouse defines __destruct, and PHP < 8.0's cycle collector reclaims a
cycle containing a destructor-bearing object across two gc_collect_cycles()
passes (pass one runs the destructor, pass two frees). PHP 8.0+ reclaims it
in one. A single collect left the WeakReference live on 7.4, failing the
test; the get_gc handler is correct (the cycle IS reclaimed, confirmed via
gc_collect_cycles() returning non-zero, just a pass later).

Collect twice. On 8.0+ the second call is a no-op. Reproduced with a pure
PHP destructor-cycle (no extension) to confirm it's an engine behavior, not
a handler defect.
Both deliberately switch to de_DE to prove the extension formats floats
locale-independently; they only ever ran on CI when the runner shipped
de_DE (it doesn't), so two latent 7.4 issues stayed hidden until the
local 7.4 lane gained the locale.

064: the write path is correct (the {x:Float64} round-trip reaches the
server as "1.5" and succeeds), but the assertion echoed the returned
double. PHP < 8.0 renders a float through LC_NUMERIC ("1,5" under de_DE);
PHP 8.0 made float-to-string locale-independent. Compare the value
numerically instead so the test verifies the wire format, not the
engine's echo behavior.

070: a PHP array literal [1.5 => "a"] coerces the float key to int
(silent < 8.1, deprecated after), so the keys never reached the Float64
read decoder the test targets -- the EXPECTF never matched reality, which
is why it skipped on >= 8.1 and silently mis-ran below. Build the map
server-side via map(1.5,'a',0.1,'b'), drop the version skip, and match
the locale-independent string key php_gcvt emits (0.10000000000000001).

Verified on 7.4 and 8.4 (identical keys); full 8.4 suite 143 passed,
0 failed.
- Update vendored tree to 2.6.2
- Mark memcpy guard in string.cpp as obsoleted (upstreamed)
- Re-apply/adapt remaining 5 local patches (BeginInsert query_id, cityhash guard, public Query overloads for BeginInsert+SelectWithExternalData, early insert state clear in ResetConnection)
- Add new columns bool.cpp/json.cpp to config.m4 + config.w32
- Fix Map(Float) key lifetime: release ref on temporary map_zv after add, use std::string for float key strings instead of stack kbuf
- Clear per-Query callbacks after Select to avoid closures holding refs to do_select_into stack
- Add missing main/snprintf.h include for php_gcvt prototype
- 070.phpt now strict (no trailing %a); test passes cleanly under ASAN build
- Add minimal repro tests/070_minimal.php for the locale+Map(Float) case
- Update docs and changelog for the bump
Removed the erroneous Z_DELREF_P after add_next_index_zval/sc_add_assoc_zval_ex
for the map_zv (and the comment claiming the add does an inc).

This was causing heap-use-after-free on the Map array (allocated at array_init
in convertToZval) under ASAN when the result row / foreach / json_encode etc.
later destroyed it. The add takes ownership of the refcount=1 from array_init;
ZVAL_UNDEF is sufficient to detach the SC_MAKE pointer temp (consistent with
emitNestedZval and other complex column paths in the same file).

The std::string float-key copies (in the kkind==2 branches) and the 7
post-Select On* clears remain.

Addresses the sanitizer (and ZTS) crashes on 019.phpt (and related Map tests)
that made the 2.6.2 bump CI red. 086.phpt paths were inspected and untouched
by this change.
…n to 2.6.2

The 5 local patches re-applied against vendored clickhouse-cpp v2.6.2 are
now kept as applyable .patch files under patches/upstream/, so future lib
bumps can git-apply them instead of hand-reconstructing from LOCAL_PATCHES.md
prose. Bump .upstream/clickhouse-cpp.yml pin 2.6.1 -> 2.6.2 to match what is
vendored (and what upstream ships); the empty string_view memcpy UB guard is
now upstream and no longer carried locally.
clickhouse-cpp 2.6.2's VariantIndex trait built a std::variant in a
constexpr non-type template argument, which MSVC 14.29 (VS2019, the PHP 8.3
Windows toolset) rejects with C2440. VS2022 (8.4/8.5) and GCC/Clang accept
it, so the break only hit the 8.3 Windows lane -- and only in
release-windows.yml, which runs after the tag. 0.8.6 shipped with no PHP 8.3
Windows DLLs as a result.

Compute the index with a std::is_same_v fold instead (no std::variant
constructed); identical result, portable across toolsets. Drop the now-unused
VariantIndexTag. Recorded as local patch 0006 + LOCAL_PATCHES.md entry.

Expand the pre-tag Windows gate (tests.yml) from 8.4 to 8.3/8.4/8.5 so a
vs16-only break blocks the release instead of surfacing post-tag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants