Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs-website/docs/concepts/data-classes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: "In Haystack, there are a handful of core classes that are regularl

In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.

Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, FileContent, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.

You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference.

Expand Down Expand Up @@ -120,6 +120,12 @@ image = ByteStream.from_file_path("dog.jpg")

Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page.

### FileContent

`FileContent` represents a file payload that can be attached to a `ChatMessage`, including base64 data, MIME type, filename, and provider-specific metadata.

Read the detailed documentation for the `FileContent` data class on a dedicated [FileContent](data-classes/filecontent.mdx) page.

### Document

#### Overview
Expand Down
119 changes: 119 additions & 0 deletions docs-website/docs/concepts/data-classes/filecontent.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: "FileContent"
id: filecontent
slug: "/filecontent"
description: "`FileContent` represents file payloads in chat messages, including base64 data, MIME type, filename, and provider-specific metadata."
---

# FileContent

`FileContent` represents a file payload that can be attached to a [`ChatMessage`](chatmessage.mdx). Use it when a chat model accepts file inputs, such as PDFs or other documents, together with the user's text prompt.

If you need the full list of parameters and methods, see the [`FileContent` API reference](/reference/data-classes-api#filecontent).

## Attributes

```python
@dataclass
class FileContent:
base64_data: str
mime_type: str | None = None
filename: str | None = None
extra: dict[str, Any] = field(default_factory=dict)
validation: bool = True
```

- `base64_data` stores the file content as a base64-encoded string.
- `mime_type` identifies the file type, for example `application/pdf`. Providing it explicitly is recommended because many model providers require it.
- `filename` is optional, but some providers use it when processing uploaded files.
- `extra` can store provider-specific metadata. Values should be JSON serializable.
- `validation` checks that `base64_data` is valid and tries to infer the MIME type when one is not provided.

## Create from a file path

Use `from_file_path` to read a local file, base64-encode it, infer the MIME type from the path, and populate the filename.

```python
from haystack.dataclasses import ChatMessage, FileContent

file_content = FileContent.from_file_path("data/attention-is-all-you-need.pdf")

message = ChatMessage.from_user(
content_parts=[
file_content,
"Summarize the key ideas in this paper.",
]
)
```

Pass `filename` or `extra` when a provider expects a specific filename or provider-specific options:

```python
file_content = FileContent.from_file_path(
"data/report.pdf",
filename="quarterly-report.pdf",
extra={"source": "finance"},
)
```

## Create from a URL

Use `from_url` to download a file and convert it into a `FileContent` instance.

```python
from haystack.dataclasses import FileContent

file_content = FileContent.from_url(
"https://example.com/reports/quarterly-report.pdf",
timeout=30,
)
```

If no filename is provided, Haystack uses the final path segment of the URL.

## Create from base64 data

If you already have file bytes, encode them and pass the MIME type explicitly.

```python
import base64
from pathlib import Path

from haystack.dataclasses import FileContent

data = Path("data/manual.pdf").read_bytes()
file_content = FileContent(
base64_data=base64.b64encode(data).decode("utf-8"),
mime_type="application/pdf",
filename="manual.pdf",
)
```

Set `validation=False` only when the base64 data and MIME type are already trusted and you want to skip validation.

## Inspect files in a ChatMessage

After adding `FileContent` to a `ChatMessage`, use the `file` and `files` properties to access file payloads.

```python
from haystack.dataclasses import ChatMessage, FileContent

file_content = FileContent.from_file_path("data/invoice.pdf")
message = ChatMessage.from_user(content_parts=[file_content, "Extract the invoice total."])

print(message.file)
print(message.files)
```

`message.file` returns the first file payload, or `None` if there are no files. `message.files` returns all file payloads.

## Serialization

Use `to_dict` and `from_dict` to serialize and restore file content.

```python
payload = file_content.to_dict()
restored = FileContent.from_dict(payload)
```

For tracing, Haystack replaces the full base64 payload with a placeholder so large files are not sent to the tracing backend.
1 change: 1 addition & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ export default {
},
items: [
'concepts/data-classes/chatmessage',
'concepts/data-classes/filecontent',
],
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: "In Haystack, there are a handful of core classes that are regularl

In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.

Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, FileContent, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.

You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference.

Expand Down Expand Up @@ -120,6 +120,12 @@ image = ByteStream.from_file_path("dog.jpg")

Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page.

### FileContent

`FileContent` represents a file payload that can be attached to a `ChatMessage`, including base64 data, MIME type, filename, and provider-specific metadata.

Read the detailed documentation for the `FileContent` data class on a dedicated [FileContent](data-classes/filecontent.mdx) page.

### Document

#### Overview
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: "FileContent"
id: filecontent
slug: "/filecontent"
description: "`FileContent` represents file payloads in chat messages, including base64 data, MIME type, filename, and provider-specific metadata."
---

# FileContent

`FileContent` represents a file payload that can be attached to a [`ChatMessage`](chatmessage.mdx). Use it when a chat model accepts file inputs, such as PDFs or other documents, together with the user's text prompt.

If you need the full list of parameters and methods, see the [`FileContent` API reference](/reference/data-classes-api#filecontent).

## Attributes

```python
@dataclass
class FileContent:
base64_data: str
mime_type: str | None = None
filename: str | None = None
extra: dict[str, Any] = field(default_factory=dict)
validation: bool = True
```

- `base64_data` stores the file content as a base64-encoded string.
- `mime_type` identifies the file type, for example `application/pdf`. Providing it explicitly is recommended because many model providers require it.
- `filename` is optional, but some providers use it when processing uploaded files.
- `extra` can store provider-specific metadata. Values should be JSON serializable.
- `validation` checks that `base64_data` is valid and tries to infer the MIME type when one is not provided.

## Create from a file path

Use `from_file_path` to read a local file, base64-encode it, infer the MIME type from the path, and populate the filename.

```python
from haystack.dataclasses import ChatMessage, FileContent

file_content = FileContent.from_file_path("data/attention-is-all-you-need.pdf")

message = ChatMessage.from_user(
content_parts=[
file_content,
"Summarize the key ideas in this paper.",
]
)
```

Pass `filename` or `extra` when a provider expects a specific filename or provider-specific options:

```python
file_content = FileContent.from_file_path(
"data/report.pdf",
filename="quarterly-report.pdf",
extra={"source": "finance"},
)
```

## Create from a URL

Use `from_url` to download a file and convert it into a `FileContent` instance.

```python
from haystack.dataclasses import FileContent

file_content = FileContent.from_url(
"https://example.com/reports/quarterly-report.pdf",
timeout=30,
)
```

If no filename is provided, Haystack uses the final path segment of the URL.

## Create from base64 data

If you already have file bytes, encode them and pass the MIME type explicitly.

```python
import base64
from pathlib import Path

from haystack.dataclasses import FileContent

data = Path("data/manual.pdf").read_bytes()
file_content = FileContent(
base64_data=base64.b64encode(data).decode("utf-8"),
mime_type="application/pdf",
filename="manual.pdf",
)
```

Set `validation=False` only when the base64 data and MIME type are already trusted and you want to skip validation.

## Inspect files in a ChatMessage

After adding `FileContent` to a `ChatMessage`, use the `file` and `files` properties to access file payloads.

```python
from haystack.dataclasses import ChatMessage, FileContent

file_content = FileContent.from_file_path("data/invoice.pdf")
message = ChatMessage.from_user(content_parts=[file_content, "Extract the invoice total."])

print(message.file)
print(message.files)
```

`message.file` returns the first file payload, or `None` if there are no files. `message.files` returns all file payloads.

## Serialization

Use `to_dict` and `from_dict` to serialize and restore file content.

```python
payload = file_content.to_dict()
restored = FileContent.from_dict(payload)
```

For tracing, Haystack replaces the full base64 payload with a placeholder so large files are not sent to the tracing backend.
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@
"id": "concepts/data-classes"
},
"items": [
"concepts/data-classes/chatmessage"
"concepts/data-classes/chatmessage",
"concepts/data-classes/filecontent"
]
},
{
Expand Down