Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion csharp/ql/lib/qlpack.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ dependencies:
codeql/xml: ${workspace}
dataExtensions:
- ext/*.model.yml
- ext/generated/*.model.yml
- ext/generated/**/*.model.yml
warnOnImplicitThis: true
compileForOverlayEval: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look up the docs for this and found that it's actually a nested class. I believe we would need the below syntax to correctly specify it. (You can see this is used for java.io.ObjectInputFilter.Config, for example.)

Suggested change
- ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.data", "Json$ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Gemini 3.1 Pro doesn't think this should be a sink.

Reasoning: Unsafe deserialization vulnerabilities occur when an application deserializes untrusted data into arbitrary Java objects (allowing an attacker to trigger malicious gadget chains). However, Json.ObjectReader is designed to strictly read Avro-encoded data matching the specific Json.SCHEMA internal to Apache Avro.

If you examine its implementation, it maps incoming primitive tokens directly to basic, safe Jackson JsonNode types (like LongNode, DoubleNode, TextNode, ArrayNode, and ObjectNode) and then unwraps them into basic Java structures (Map, List, String, Long, etc.). Since it does not perform polymorphic deserialization or resolve arbitrary class names from the data stream, it is structurally immune to unsafe class instantiation and does not act as a deserialization sink.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Out of curiosity, how did you ask the model? Did you provide the source code, which the tool (currently) does not.
My impression is that you can ask it multiple times and get different results, which is why I would like to know what input it was given.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which arguments of org.apache.avro.data.Json.ObjectReader.read, if any, should be "unsafe-deserialization" sinks for CodeQL?

It looks like it searched GitHub and found the source code. In the past I have given it the javadocs that I've found online, but I was lazy this time.

Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.file", "DataFileReader", True, "DataFileReader", "(File,DatumReader)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader", True, "openReader", "(File,DatumReader)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "DataFileWriter", True, "appendTo", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "DataFileWriter", True, "create", "(Schema,File)", "", "Argument[1]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "SeekableFileInput", True, "SeekableFileInput", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "SyncableFileOutputStream", True, "SyncableFileOutputStream", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "SyncableFileOutputStream", True, "SyncableFileOutputStream", "(File,boolean)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "SyncableFileOutputStream", True, "SyncableFileOutputStream", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro.file", "SyncableFileOutputStream", True, "SyncableFileOutputStream", "(String,boolean)", "", "Argument[0]", "path-injection", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.avro.file", "DataFileReader12", True, "getMeta", "(String)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader12", True, "getMetaString", "(String)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader12", True, "getSchema", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader12", True, "iterator", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader12", True, "next", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileReader12", True, "next", "(Object)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "getHeader", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "getMeta", "(String)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "getMetaKeys", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "getMetaString", "(String)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "getSchema", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "iterator", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "next", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "next", "(Object)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "DataFileStream", True, "nextBlock", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "FileReader", True, "getSchema", "()", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "FileReader", True, "next", "(Object)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro.file", "SeekableInput", True, "read", "(byte[],int,int)", "", "Argument[0]", "file", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.generic", "GenericDatumReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.generic", "GenericDatumReader", True, "read", "(Object,Schema,ResolvingDecoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.io", "DatumReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.io", "ExecutionStep", True, "execute", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.io", "FieldReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.io", "RecordReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.message", "BaseDecoder", True, "decode", "(ByteBuffer)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "BaseDecoder", True, "decode", "(ByteBuffer,Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "BaseDecoder", True, "decode", "(InputStream)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "BaseDecoder", True, "decode", "(byte[])", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "BaseDecoder", True, "decode", "(byte[],Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "BinaryMessageDecoder", True, "decode", "(InputStream,Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(ByteBuffer)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(ByteBuffer,Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(InputStream)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(InputStream,Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(byte[])", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "MessageDecoder", True, "decode", "(byte[],Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.message", "RawMessageDecoder", True, "decode", "(InputStream,Object)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro", "Parser", True, "parse", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "Protocol", True, "parse", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "Schema", True, "parse", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "SchemaParser", True, "parse", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "SchemaParser", True, "parse", "(File,Charset)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "SchemaParser", True, "parse", "(Path)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "SchemaParser", True, "parse", "(Path,Charset)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.avro", "SchemaParser", True, "parse", "(URI,Charset)", "", "Argument[0]", "request-forgery", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.avro", "Parser", True, "parse", "(File)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro", "Protocol", True, "parse", "(File)", "", "ReturnValue", "file", "ai-generated"]
- ["org.apache.avro", "Schema", True, "parse", "(File)", "", "ReturnValue", "file", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.reflect", "CustomEncoding", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "read", "(Object,Schema,ResolvingDecoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "readArray", "(Object,Schema,ResolvingDecoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "readBytes", "(Object,Schema,Decoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "readField", "(Object,Schema$Field,Object,ResolvingDecoder,Object)", "", "Argument[3]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "readInt", "(Object,Schema,Decoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.reflect", "ReflectDatumReader", True, "readString", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.specific", "SpecificDatumReader", True, "readField", "(Object,Schema$Field,Object,ResolvingDecoder,Object)", "", "Argument[3]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.specific", "SpecificDatumReader", True, "readRecord", "(Object,Schema,ResolvingDecoder)", "", "Argument[2]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.specific", "SpecificExceptionBase", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.specific", "SpecificFixed", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.specific", "SpecificRecordBase", True, "customDecode", "(ResolvingDecoder)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.avro.specific", "SpecificRecordBase", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/avro.git#79017ee391c04f60bdffd5fecf9ecc27c1b1f420 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.avro.util", "RandomData", True, "main", "(String[])", "", "Argument[0]", "path-injection", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.avro.util", "RandomData", True, "main", "(String[])", "", "Argument[0]", "commandargs", "ai-generated"]
Loading
Loading