Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2ef114f
[template] feat(collector): copy-pasting the sentinelone probe as a s…
guzmud Apr 8, 2026
2460166
[template] feat(collector): withdrawing SentinelOne specific code and…
guzmud Apr 8, 2026
fc02033
[template] refactor(collector): expectation models moved to base models
guzmud Apr 21, 2026
d7e1739
[template] feat(protocols): adding source-related protocols
guzmud Apr 21, 2026
4eab80e
[template] refactor(settings): renaming configs to settings
guzmud Apr 21, 2026
390988d
[template] refactor(collector): exception models move to base models
guzmud Apr 22, 2026
22a31d4
[template] feat(protocols): adding collector protocols to the template
guzmud May 7, 2026
a069524
[template] feat(types): adding custom collector types
guzmud May 7, 2026
4ec8a1d
[template] feat(models): adding collector models to the template
guzmud May 7, 2026
cf38d6c
[template] feat(uploaders): adding resilient uploaders to internals
guzmud May 7, 2026
2839eef
[template] feat(utils): adding a retroport of batched for py3.11 support
guzmud May 7, 2026
4c39c9f
[template] feat(engine): adding the basic collector engine
guzmud May 7, 2026
1caa57c
[template] feat(collector): reworking the common collector elements
guzmud May 7, 2026
467ae69
[template] fix(collector): fixing various issues
guzmud May 7, 2026
8307439
[template] refactor(collector): improving code quality
guzmud May 7, 2026
012d176
[template] feat(types): adding BulkData generic type (+fix test)
guzmud May 11, 2026
c6eb06a
[template] fix(collector): add slugify for collector type
guzmud May 11, 2026
50443be
[template] fix(uploaders): error between results and valid_results
guzmud May 11, 2026
337a7a6
[template] fix(config): wrong aliases in config_loader
guzmud May 11, 2026
b7732ae
[template] fix(engine): moving try/except to inside the batch loop
guzmud May 13, 2026
576621b
[template] fix(models): fixing missing proper default factory in data…
guzmud May 13, 2026
49e2234
[template] feat(settings): moving template config to a generic custom…
guzmud May 13, 2026
fb2a085
[template] feat(collector): propagating custom config to sourcehandle…
guzmud May 13, 2026
781e4ed
[template] fix(config): propagating the renaming from template to custom
guzmud May 18, 2026
3bf40f2
[template] chore(README): updating the README to match the new archit…
guzmud May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions template/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Configuration files
config.yml

# Build artifacts
dist

# Cache directories
__pycache__
.ruff_cache
.mypy_cache
.pytest_cache
10 changes: 10 additions & 0 deletions template/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
config.yml

# Build artifacts
dist

# Cache directories
__pycache__
.ruff_cache
.mypy_cache
.pytest_cache
319 changes: 319 additions & 0 deletions template/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,319 @@
# Contributing to Template Collector

This document provides guidance for contributing to the Template collector for OpenAEV. This collector is now feature-complete with Template-specific implementation.

## Current Implementation Status

**COMPLETED**: The Template collector is fully implemented with the following components:

### Core Components
- **Collector Core** ([`src/collector/collector.py`](src/collector/collector.py)) - Main daemon with Template service integration
- **Expectation Handler** ([`src/collector/expectation_handler.py`](src/collector/expectation_handler.py)) - Generic handler using service provider pattern
- **Expectation Manager** ([`src/collector/expectation_manager.py`](src/collector/expectation_manager.py)) - Batch processing and API interactions
- **Configuration System** ([`src/models/configs/`](src/models/configs/)) - Hierarchical configuration with Template settings
- **Service Providers** - Complete Template-specific implementation

### Template Implementation
- **Data Fetcher** ([`src/services/fetcher_data.py`](src/services/fetcher_data.py)) - Prevention data correlation
- **Expectation Service** ([`src/services/expectation_service.py`](src/services/expectation_service.py)) - Business logic implementation
- **Trace Service** ([`src/services/trace_service.py`](src/services/trace_service.py)) - Trace creation
- **Data Converter** ([`src/services/converter.py`](src/services/converter.py)) - Template to OAEV format conversion
Comment on lines +11 to +20

### Supported Features
- **Signature Support**: `start_date`, `end_date`
- **Retry Mechanism**: Configurable retries with ingestion delay handling
- **Trace Generation**: Links back to external tool available
- **Error Handling**: Comprehensive exception handling and logging
- **Configuration Management**: YAML, environment variables, defaults

## Installation and Setup

### Poetry Dependency Groups

- `--with dev`: Development tools (ruff, mypy, black, etc.)
- `--with test`: Testing tools (pytest, coverage, etc.)

### Poetry Extras

- `--extra prod`: Get pyoaev from PyPI (production releases)
- `--extra current`: Get pyoaev from Git release/current branch
- `--extra local`: Get pyoaev locally from `../../client-python`

### Development Installation

```bash
# Development setup with current pyoaev version
poetry install -E current --with dev,test
Comment on lines +39 to +46

# Production setup
poetry install -E prod

# Local development with local pyoaev
poetry install -E local --with dev,test
```

### Running the Collector

```bash
# Direct execution
TemplateCollector

# Using Python module execution
python -m src

# Using Poetry to run
poetry run python -m src
```

## Development Workflow

### Setting Up Development Environment

1. **Clone and Install**:
```bash
git clone <collector-repo>
cd template
poetry install -E current --with dev,test
```

2. **Configure for Development**:
```bash
# Copy sample config
cp src/config.yml.sample src/config.yml

# Edit with your Template details
vim src/config.yml
```

3. **Run Development Tools**:
```bash
# Format code
poetry run black src/

# Lint code
poetry run ruff check src/

# Type checking
poetry run mypy src/

# Run tests
poetry run pytest
```

### Code Organization

The codebase follows a clean architecture with clear separation of concerns:

```
src/
├── collector/ # Generic collector framework
│ ├── collector.py # Main collector daemon
│ ├── expectation_handler.py
│ ├── expectation_manager.py
│ ├── trace_manager.py
│ └── models.py # Pydantic data models
├── services/ # Template-specific implementation
│ ├── expectation_service.py # Business logic
│ ├── trace_service.py # Trace creation
│ ├── converter.py # Data conversion
│ ├── fetcher_*.py # Data fetchers
│ └── model_*.py # Data models
└── models/ # Configuration management
└── configs/ # Hierarchical config system
```

## Testing

### Test Structure

```bash
# Run all tests
poetry run pytest

# Run specific test files
poetry run pytest tests/test_expectation_service.py

# Run with verbose output
poetry run pytest -v
```

### Test Categories

- **Unit Tests**: Test individual components in isolation
- **Integration Tests**: Test external tool interactions
- **Configuration Tests**: Validate config loading and validation
- **Service Provider Tests**: Test expectation handling logic

## Code Quality Standards

### Formatting and Linting

- **Black**: Code formatting (line length: 88)
- **Ruff**: Fast Python linter
- **MyPy**: Static type checking
- **Pre-commit**: Automated checks before commits

### Code Style Guidelines

- Use type hints throughout
- Follow Python PEP 8 conventions
- Write descriptive docstrings for public methods
- Implement comprehensive error handling
- Add meaningful logging with appropriate levels
- Use Pydantic models for data validation

### Error Handling Patterns

```python
# Use custom exceptions from src/collector/exception.py
from src.collector.exception import CollectorProcessingError

try:
result = process_expectation(expectation)
except TemplateServiceError as e:
logger.error(f"Template error: {e}")
raise CollectorProcessingError(f"Processing failed: {e}") from e
```

### Logging Best Practices

```python
# Use consistent log prefixes
LOG_PREFIX = "[ComponentName]"

# Include context in error logs
logger.error(
f"{LOG_PREFIX} Error processing expectation: {e} "
f"(Context: expectation_id={expectation_id}, retry_count={retries})"
)
```

## Contributing Guidelines

### Making Changes

1. **Create Feature Branch**:
```bash
git checkout -b feature/your-feature-name
```

2. **Make Changes**:
- Follow existing code patterns
- Add/update tests
- Update documentation
- Ensure type hints are complete

3. **Test Changes**:
```bash
poetry run pytest
poetry run mypy src/
poetry run ruff check src/
```

4. **Commit and Push**:
```bash
git add .
git commit -m "feat: description of your changes"
git push origin feature/your-feature-name
```

### Pull Request Guidelines

- Provide clear description of changes
- Update documentation as needed
- Ensure all CI checks pass
- Request review from maintainers

### Extending the Collector

#### Adding New Signature Types

1. Update `SUPPORTED_SIGNATURES` in [`src/services/expectation_service.py`](src/services/expectation_service.py)
2. Update fetching processes in [`src/services/fetcher_data.py`](src/services/fetcher_data.py)
3. Update data conversion logic in [`src/services/converter.py`](src/services/converter.py)
4. Add corresponding tests

#### Adding New API Endpoints

1. Create fetcher class following pattern of existing fetchers
2. Update client API to use new fetcher
3. Add data models in `src/services/model_*.py`
4. Update service provider logic

#### Configuration Changes

1. Add fields to appropriate config models in `src/models/configs/`
2. Update config loader and validation
3. Update sample configuration files
4. Document new configuration options

## Template Adaptation

This collector is built on a reusable foundation that can be adapted for other security platforms. If you want to create a similar collector for another platform (e.g., CrowdStrike, Microsoft Defender):

### Template-Specific References to Change

#### Configuration Files
- [ ] `pyproject.toml` - Update project name and script names
- [ ] [`src/config.yml`](src/config.yml) - Update collector ID
- [ ] [`src/config.yml.sample`](src/config.yml.sample) - Update sample configuration

#### Code References
- [ ] [`src/services/utils/config_loader.py`](src/services/utils/config_loader.py) - Rename config classes
- [ ] [`src/collector/collector.py`](src/collector/collector.py) - Update service imports
- [ ] [`src/models/configs/collector_configs.py`](src/models/configs/collector_configs.py) - Update defaults
- [ ] Platform-specific service implementations in `src/services/`

### Reusable Components

The following components are platform-agnostic and can be reused:
- Generic collector daemon
- Service provider protocols
- Configuration management system
- Expectation processing pipeline
- Signature registry system
- Trace management system

## Common Issues and Solutions

### Development Issues

#### Import Errors
- Ensure Poetry environment is activated
- Check that all dependencies are installed with correct extras

#### Configuration Loading
- Verify YAML structure matches Pydantic models
- Check environment variable naming conventions
- Validate required fields are present

#### API Integration Testing
- Use mock objects for unit tests
- Set up test Template environment for integration tests
- Handle rate limits in test environments

### Production Issues

#### Performance Optimization
- Monitor API response times and adjust retry intervals
- Use batch processing for large expectation sets
- Optimize query time windows based on data volume

#### Error Recovery
- Implement circuit breakers for persistent API failures
- Add health checks for service monitoring
- Use graceful degradation when possible

## Documentation

### Code Documentation
- Write clear docstrings for all public interfaces
- Include type hints and parameter descriptions
- Provide usage examples for complex functions

### Configuration Documentation
- Document all configuration options
- Provide example configurations for different scenarios
- Include troubleshooting guides for common issues

This collector provides a production-ready Template integration for OpenAEV with comprehensive error handling, configurable retry logic, and detailed trace generation.
32 changes: 32 additions & 0 deletions template/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM python:3.13-alpine AS builder

# poetry version available on Ubuntu 24.04
RUN pip3 install poetry==2.1.3

RUN apk update && apk upgrade

ARG installdir=/collector
ADD . ${installdir}
RUN cd ${installdir} && poetry build

FROM python:3.13-alpine AS runner

# Declare the build argument
ARG PYOAEV_GIT_BRANCH_OVERRIDE

ARG installdir=/collector
COPY --from=builder ${installdir} ${installdir}
RUN cd ${installdir}/dist && pip3 install --no-cache-dir "$(ls *.whl)[prod]"

RUN if [[ ${PYOAEV_GIT_BRANCH_OVERRIDE} ]] ; then \
echo "Forcing specific version of client-python" && \
apk add --no-cache git && \
pip install pip3-autoremove && \
pip-autoremove pyoaev -y && \
Comment on lines +21 to +25
pip install git+https://github.com/OpenAEV-Platform/client-python@${PYOAEV_GIT_BRANCH_OVERRIDE} ; \
fi

# necessary for icon location
WORKDIR ${installdir}

CMD ["python3", "-m", "src"]
Loading
Loading