[TINKERPOP 3107] Remove Groovy dependency from default server initialization by Cole-Greer · Pull Request #3384 · apache/tinkerpop

Cole-Greer · 2026-04-14T20:05:14Z

https://issues.apache.org/jira/browse/TINKERPOP-3107

Gremlin Server currently relies on Groovy init scripts for server initialization — binding traversal sources, running lifecycle hooks, and loading data. This PR introduces a Groovy-free initialization path so that Groovy can be disabled by default in all shipped server configs. Groovy remains available as an
opt-in for backward compatibility.

Changes

Three new YAML configuration mechanisms replace the Groovy init scripts:

Auto-created TraversalSources — After graphs are loaded, a default TraversalSource is automatically created for each graph that doesn't have an explicit traversalSources entry. A graph named graph gets g; others get g_. A minimal config with just a graphs section is now fully functional.

Declarative traversalSources — A new YAML section for explicit TraversalSource creation with optional strategy configuration via a Gremlin query:
yaml
traversalSources: {
g: {graph: graph},
gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}

Java-based lifecycleHooks — A new YAML section for configuring LifeCycleHook implementations via reflection, replacing Groovy-based hook creation:
yaml
lifecycleHooks:

className: org.apache.tinkerpop.gremlin.server.util.TinkerFactoryDataLoader
config: {graph: graph, dataset: modern}

Supporting changes

TinkerFactoryDataLoader — built-in LifeCycleHook that loads TinkerFactory sample datasets (modern, classic, crew, grateful, sink, airroutes) without Groovy
LifeCycleHook interface — added default void init(Map) for configuration and GraphManager to Context
Settings — new TraversalSourceSettings and LifeCycleHookSettings config classes with SnakeYAML type descriptors
ServerTestHelper — null-safe for configs without gremlin-groovy in scriptEngines
Deprecated the Groovy-based LifeCycleHook/TraversalSource creation path with startup warnings
Script engine allowlist — GremlinExecutor now restricts available script engines to those
explicitly listed in scriptEngines (plus gremlin-lang, which is always permitted).
Requests targeting an engine not in the allowlist are rejected. This prevents clients from
invoking gremlin-groovy when it is removed from the default configuration.
New gremlin-server-airroutes.yaml shipped config

Config migrations

All default configs under gremlin-server/conf/ updated to remove gremlin-groovy from scriptEngines. All Docker integration test configs (docker/gremlin-server/) and JVM test configs migrated to use traversalSources + lifecycleHooks instead of generate-all.groovy. The gremlin-console test infrastructure
similarly migrated.

Deleted files

gremlin-server/src/test/scripts/generate-all.groovy — replaced by YAML config
gremlin-server/src/test/scripts/test-server-start.groovy — dead code
gremlin-console/src/test/resources/.../generate.groovy — replaced by YAML config

Testing

TinkerFactoryDataLoaderTest — all 6 datasets, error cases (missing graph, non-TinkerGraph, unknown dataset, missing config)
ServerGremlinExecutorTest — auto-creation, explicit suppression, lifecycle hook instantiation, resolveLanguage logic
SettingsTest — YAML parsing for traversalSources and lifecycleHooks, defaults when absent
GremlinServerConfigIntegrateTest — parameterized smoke test that boots each shipped config and runs a query
Existing JVM integration tests exercise the migrated gremlin-server-integration.yaml
GLV integration tests exercise the migrated Docker configs

VOTE +1

spmallette · 2026-04-15T00:15:43Z

+----
+traversalSources: {
+  g: {graph: graph},
+  gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)", language: "gremlin-lang"}}


is "query" the right terminology here? it's not really a functioning query - it's only allowing configuration steps i assume? maybe it's a "sourceConfiguration"?? you use "expression" below - that's better - like, "gremlinExpression"?? "sourceExpression"??

anything stopping someone from doing stuff other than configuration steps? what happens if i use another language and do g.V()....? seems like we might want some controls there if there aren't any already.

I'll change it to "gremlinExpression".

The only control that currently exists is whatever the script engine outputs from that "gremlinExpression" must be castable to TraversalSource. I believe that should block most accidental misuse, without really restricting any legitimate configuration.

I'm personally not as worried about additional enforced restrictions. My assumption is anyone with access to the server config is already a trusted individual. The ability to run a gremlin query during traversal source initialization is already a much weaker tool than the groovy init scripts of old.

i think the protection of TraversalSource is fine. the issue i had here was definitely with trusted individuals trying to do "interesting" things that we really don't want to support like, g.V().drop() as a "convenient" way to clean a graph for some test rig they got going on. or to load data, or whatever. just don't want to see that kind of usage pop up.

If they really wanted to try, they could probably sneak in multiple traversals in that single expression, as long as the last one returned a TraversalSource. I hope that's obviously enough of a bad idea that users aren't drawn to it, especially if we give a more convenient and powerful option via the lifecycle hooks.

spmallette · 2026-04-15T11:08:19Z

+  g: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}
+----
+
+Alternatively, the deprecated `ScriptFileGremlinPlugin` approach can still be used with a Groovy script:


i still wonder if we should document any of the Groovy configuration. Other than to have a callout that says that the groovy stuff from 3.x is still there, I wonder if we still need all this stuff. I don't think we should offer an "Alternatively, " in 4.x.

I've changed how the groovy configs are presented in the docs. I don't want to completely remove it from the docs as long as it still exists, but I'm now framing it as clearly a deprecated legacy option which points to the new config as the recommended path.

spmallette · 2026-04-15T11:10:35Z

 [[script-execution]]
 ==== Protecting Script Execution

+NOTE: As of 4.0.0, Groovy is no longer required for server initialization.  The `scriptEngines` configuration


Pointing back to my last comment, this whole section could just go away rather than muddy up the conversation with callouts and "don't do this, but here it is in case you want to" discussion.

I think this section on hardening groovy still has some value for anyone who still chooses to use the groovy script engine. I've reworded the heading to make it clear these are specifically groovy concerns, and that gremlin-lang is the default and recommended path, which sidesteps these hardening concerns.

spmallette · 2026-04-15T11:19:17Z


+=== Upgrading for Users
+
+==== Gremlin Server Initialization Without Groovy


this is all "what" changed - it's virtually all Reference Documentation. there's nothing about "why". there's nothing about the "benefit". nothing that prepares the reader for what's coming in the "future" with a server without Groovy. get people thinking forward. like, "Auto-created TraversalSources" - like, doesn't that "simplify" the configuration wonderfully? what about the "challenges" - like, how to maintain provider flexibility without a full scripting engine? where are the words that make this major new change have some life?

I've reworked the upgrade docs to tell more of the "why", and to better demonstrate the conveniences of the new declarative config system.

How about "More Secure Gremlin Server" as a title?

The first sentence is rough with the semicolon:

Previous versions of Gremlin Server relied on the Groovy script engine for basic server initialization; binding
traversal sources, loading data, and running lifecycle hooks all required Groovy init scripts.

How about:

Previous versions of Gremlin Server relied on a Gremlin-flavored Groovy ScriptEngine for basic server initialization, which covered binding traversal sources, loading data, and running lifecycle hooks all setup by Groovy initialization scripts.

Change "script engine" to ScriptEngine consistently. If this is about security, I think that it would make sense to mention how an earlier version took a step toward security by making gremlin-language the default for script execution and this is the second step required to make the server more secure by default. you allude to that in the second paragraph without that earlier reference about sending remote scripts. I would further allude to the idea that a future version will wholly remove Groovy as an installed option in the server.

spmallette · 2026-04-15T11:25:10Z

                .map(kv -> (LifeCycleHook) kv.getValue())
                .collect(Collectors.toList());

+        // process declarative traversalSources from YAML config - track which graphs have explicit entries


Are we allowing both script and declarative configurations to live in the same yaml? If so, is that a sharp edge that might mess someone up?

I don't intend for users to mix and match groovy init scripts with the declarative approach, but I hadn't intended to restrict it either. My preference would be for users to completely migrate to the new configs in one atomic step, however I don't see any reason to prevent users from say converting their traversalSource configuration to the declarative system, while retaining an old groovy LifeCycleHook for a while to do data loading.

In the long-term, they will need to move everything to the new system, but it seems reasonable to allow a progressive migration.

I will followup on the order of operations there to see if there are likely to be bad interactions between the 2 systems.

ok, it's probably fine for them to coexist as long as they don't introduce unexpected problems or have some undocumented behavior. so, if one overwrites the other or something then at least make sure that's clear in the docs. i wouldn't be sad if the server refused to start and folks were forced into one config path or the other, but i don't think it's essential.

I'm not finding any really concerning sharp edges with co-existance. Only points really are that declarative traversalSources will overwrite any added from groovy scripts, and the YAML-based lifecycle hooks run before the groovy ones. I've added this to the reference docs.

vkagamlyk · 2026-04-16T16:04:21Z

Can you add unit test to verify gremlin-groovy is disabled? something like

 @Test
    public void gremlinLang() {
        final Cluster cluster = Cluster.build().addContactPoint("localhost").port(8182).create();
        final Client client = cluster.connect();

        // should handle traversal
        final GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster, "gmodern"));
        long count = g.V().count().next();
        assertEquals(6, count);

        // should handle script
        final RequestOptions noLang = RequestOptions.build().addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", noLang).one().getLong();
        assertEquals(6, count);

        // should handle script with explicit engine name
        final RequestOptions requestOptions = RequestOptions.build().language("gremlin-lang").addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", requestOptions).one().getLong();
        assertEquals(6, count);

        // should not allow suspicious queries
        try {
            client.submit("2+2", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        // in gremlin-groovy '1g' is valid BigDecimal value, but in gremlin-lang should be '1m';
        // the easiest way to determine which script engine the request was executed on
        try {
            client.submit("g.inject(1g)", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        final BigDecimal one = (BigDecimal)client.submit("g.inject(1m)", requestOptions).one().getObject();
        assertEquals(BigDecimal.ONE, one);
    }

Cole-Greer · 2026-04-23T00:42:44Z

Can you add unit test to verify gremlin-groovy is disabled? something like

 @Test
    public void gremlinLang() {
        final Cluster cluster = Cluster.build().addContactPoint("localhost").port(8182).create();
        final Client client = cluster.connect();

        // should handle traversal
        final GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster, "gmodern"));
        long count = g.V().count().next();
        assertEquals(6, count);

        // should handle script
        final RequestOptions noLang = RequestOptions.build().addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", noLang).one().getLong();
        assertEquals(6, count);

        // should handle script with explicit engine name
        final RequestOptions requestOptions = RequestOptions.build().language("gremlin-lang").addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", requestOptions).one().getLong();
        assertEquals(6, count);

        // should not allow suspicious queries
        try {
            client.submit("2+2", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        // in gremlin-groovy '1g' is valid BigDecimal value, but in gremlin-lang should be '1m';
        // the easiest way to determine which script engine the request was executed on
        try {
            client.submit("g.inject(1g)", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        final BigDecimal one = (BigDecimal)client.submit("g.inject(1m)", requestOptions).one().getObject();
        assertEquals(BigDecimal.ONE, one);
    }

I've adapted this test and added it to the server integration tests. I've given it extra cases to additionally verify error responses if the driver explicitly asks for language: "gremlin-groovy" when it is not configured (based on a new script engine allowlist), as well as if the driver asks for a completely unknown language.

spmallette · 2026-04-23T11:04:55Z

+Graphs with explicit `traversalSources` entries are excluded from auto-creation.
+
+[[server-lifecycle-hooks]]
+==== LifeCycleHooks


now that Gremlin Server relies on compiled code, i think you have to explain where to place your jar for this stuff to work if you build a custom one. can they use the gremlin-server.sh -i command?

you don't have to do it for this PR if you don't want to, but i think an open item here for 4.0 final is to offer another general purpose LifeCycleHook. At least one I can think of that folks would use again and again is a basic DefaultDataLoader that can just take a file path to a TinkerPop supported format as input. seems like a fast add, but if you want to put it off, please create a JIRA.

I had a similar thought when working on the TinkerFactoryDataLoader hook for this PR. I think it would be nice to elevate the TinkerFactory generate<Dataset>() methods from tinkergraph-gremlin to gremlin-core, along with a refactor to work with any arbitrary Graph. All of the small graphs are already simply based on the Structure API. There's no reason we couldn't make those directly available to arbitrary graphs. For airRoutes and grateful, I'd also like to make them more generally available (perhaps restricted to graphs which support reading gryo via the io() step, or perhaps via a long gremlin load script).

Taking this a step further and providing a prepackaged data loader where users just need to provide a path in the config is a nice convenience. I think with the Java lifecycle hooks, we can offer a lot more prebuilt configurable lifecycle hooks, instead of asking providers to write all their own scripts.

Spinning out into a new JIRA: https://issues.apache.org/jira/browse/TINKERPOP-3245

first one: GroovyLifeCycleHook

spmallette · 2026-04-23T11:08:37Z

+     *
+     * @param config the key/value pairs from the {@code config} block in the YAML entry
+     */
+    public default void init(final Map<String, Object> config) {}


this was never clear before i don't think but can a LifeCycleHook prevent the server from starting by throwing an exception? if so, i think it would be good to add some guidance here in the javadoc for that.

vkagamlyk · 2026-04-24T05:12:44Z

-               org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {expectedCompilationTime: 30000},
-               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
-               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/generate-all.groovy]}}}}
+traversalSources: {


Considering that this is TinkerPop 4, maybe it makes sense to reconsider the graph initialization?

for example can traversal sources and lifecycleHooks be part of graph configuration?

that makes some sense actually. maybe the configuration makes more sense living with the graph rather than referencing the graph all over the yaml (i think that's what @vkagamlyk 's comment was about at least). that would lead to a nicer configuration:

graphs: { graph: { configuration: conf/tinkergraph-empty.properties, traversalSources: - g - { name: "gg", gremlinExpression: "g.withStrategies(ReadOnlyStrategy)", language: "gremlin-lang" } lifecycleHooks: - { className: org.apache.tinkerpop.gremlin.server.util.TinkerFactoryDataLoader, config: {dataset: modern}} - { className: com.software.company.GroovyInsanity, config: {script: "System.exit(0)"}} } }

also, in the docs it's written that:

A graph named graph is implicitly bound to g

All others are bound to g_<graph-name> (e.g. modern gets g_modern)

but does that mean that the configuration below produces 2 traversal sources in this server (except for g: {graph: graph}, - like, gclassic and g_classic? a

~~looking at the code, i guess the above is just a documentation problem - it's not quite clear that it's referring to graphs that have no explicitly configured TraversalSource that get that naming.~~

now i see...i was reading the upgrade docs - in that context i guess what they say makes sense and the reference docs are clear. i guess that answers my questions.

Other possible options:

To have all graph related config in single .properties file, something like

gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph gremlin.graph.traversalSource=g gremlin.graph.traversalSource={ name: "gg", gremlinExpression: "g.withStrategies(ReadOnlyStrategy)", language: "gremlin-lang" } gremlin.graph.lifecycleHook={ className: org.apache.tinkerpop.gremlin.server.util.TinkerFactoryDataLoader, config: {dataset: modern}} gremlin.graph.lifecycleHook={ className: com.software.company.GroovyInsanity, config: {script: "System.exit(0)"}} gremlin.tinkergraph.vertexIdManager=LONG

or structured

gremlin.graph.traversalSource.1.name=gg gremlin.graph.traversalSource.1.gremlinExpression=g.withStrategies(ReadOnlyStrategy) gremlin.graph.traversalSource.1.language=gremlin-lang

To get rid of .properties files and have all config in single server yaml file

graphs: { graph: { graph: org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph, vertexIdManager: LONG, traversalSources: - g - { name: "gg", gremlinExpression: "g.withStrategies(ReadOnlyStrategy)", language: "gremlin-lang" } lifecycleHooks: - { className: org.apache.tinkerpop.gremlin.server.util.TinkerFactoryDataLoader, config: {dataset: modern}} - { className: com.software.company.GroovyInsanity, config: {script: "System.exit(0)"}} } }

i like your ideas to simplify....i think the properties file just has relevance as it is because you can give it to Graph.open(Configuration) which wouldn't have any relevance for those settings outside of Gremlin Server. But, maybe that's ok?? they would just be ignored and only Gremlin Server would concern itself with them? 🤔

yes, properties file is most convenient way for providers for additional graph configuration, It probably makes sense to keep it...
traversalSources and lifecycleHooks are outside of graph, so should have separate config.
I think @spmallette proposal looks better than mine.

I like the idea of pulling the traversalSources config into the graphs config. I'm less sure about lifecycleHooks. In practice, the lifecycleHooks are generally just used to load data into a graph or something like that, but in general, they are globally scoped in the server, they currently aren't bound to specific graph instances. I'll play around with it a bit and see how it feels.

yeah, that's true. i suppose they aren't constrained to a specific graph.

spmallette reviewed Apr 15, 2026

View reviewed changes

Comment thread ...in-server/src/test/resources/org/apache/tinkerpop/gremlin/server/gremlin-server-minimal.yaml

Cole-Greer added 10 commits April 22, 2026 17:08

initial PoC

c3ec80c

update test configs

c3f156f

docs and tests

4aec190

cleanups and fixes

ff9a18a

add airroutes server config

919ac22

.

ad48b95

.

01b708d

Rename query to gremlinExpression

e024c92

Update docs and tests according to PR feedback

640922b

Generate script engine allowlist in server from config.

774548f

Cole-Greer force-pushed the TINKERPOP-3107 branch from cd8dce5 to 774548f Compare April 23, 2026 00:11

document behaviour when groovy script and declarative inits coexist

6b54974

fix tests

61786d7

spmallette reviewed Apr 23, 2026

View reviewed changes

fix .net tests

53b5c3f

vkagamlyk reviewed Apr 24, 2026

View reviewed changes

Comment thread ...ore/src/main/java/org/apache/tinkerpop/gremlin/jsr223/DefaultGremlinScriptEngineManager.java


		=== Upgrading for Users

		==== Gremlin Server Initialization Without Groovy

Conversation

Cole-Greer commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Supporting changes

Config migrations

Deleted files

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vkagamlyk commented Apr 16, 2026

Uh oh!

Cole-Greer commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spmallette Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkagamlyk Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Cole-Greer commented Apr 14, 2026 •

edited

Loading

spmallette Apr 24, 2026 •

edited

Loading

vkagamlyk Apr 24, 2026 •

edited

Loading