Skip to content

Models

FileData

Bases: BaseModel

sha

sha: str = Field(default='', description='File SHA')

filename

filename: str = Field(..., description='File path and name')

status

status: str = Field(..., description='File status (added, modified, removed)')

additions

additions: int = Field(..., description='Number of lines added')

deletions

deletions: int = Field(..., description='Number of lines deleted')

changes

changes: int = Field(..., description='Total number of changes')

blob_url

blob_url: str | None = Field(default='', description='Blob URL for the file')

raw_url

raw_url: str | None = Field(default='', description='Raw URL for the file content')

contents_url

contents_url: str = Field(default='', description='Contents URL for the file')

before_edit

before_edit: str = Field(default='', description='File content before changes')

after_edit

after_edit: str = Field(default='', description='File content after changes')

patch

patch: str = Field(default='', description='Git patch/diff')

TrainingData

Bases: BaseModel

pr_info

pr_info: PullRequest = Field(..., description='Pull request information', exclude=True)

question

question: str = Field(..., description='Formatted question based on PR title and description')

files

files: list[FileData] = Field(default=[], description='List of modified files')

ExtractionResult

Bases: BaseModel

Methods:

Name Description
save_log

Save extraction results to a timestamped JSON log file.

a_save_log

Save extraction results to a timestamped JSON log file asynchronously.

repository

repository: str = Field(..., description='Repository name in format Mai0313/SWEBenchV2')

extracted_at

extracted_at: str = Field(..., description='Extraction timestamp')

total_prs

total_prs: int = Field(..., description='Total number of PRs processed')

prs

prs: list[TrainingData] = Field(default=[], description='List of training data for each PR')

save_log

save_log() -> Path

Save extraction results to a timestamped JSON log file.

Creates a JSON file containing all extraction results in a structured format, organized by repository name and timestamp for easy tracking and analysis.

Returns:

Name Type Description
Path Path

Path to the created log file.

Source code in src/swebenchv2/typings/models.py
def save_log(self) -> Path:
    """Save extraction results to a timestamped JSON log file.

    Creates a JSON file containing all extraction results in a structured format,
    organized by repository name and timestamp for easy tracking and analysis.

    Returns:
        Path: Path to the created log file.
    """
    now = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_log = Path(f"./data/{self.repository}/log_{now}.json")
    output_log.parent.mkdir(parents=True, exist_ok=True)
    log_dict = self.model_dump(mode="json", exclude_none=True, exclude_unset=True)
    output_log.write_text(json.dumps(log_dict, ensure_ascii=False, indent=2), encoding="utf-8")
    return output_log

a_save_log

a_save_log() -> Path

Save extraction results to a timestamped JSON log file asynchronously.

Asynchronously creates a JSON file containing all extraction results, running the file I/O operations in a separate thread to avoid blocking.

Returns:

Name Type Description
Path Path

Path to the created log file.

Source code in src/swebenchv2/typings/models.py
async def a_save_log(self) -> Path:
    """Save extraction results to a timestamped JSON log file asynchronously.

    Asynchronously creates a JSON file containing all extraction results,
    running the file I/O operations in a separate thread to avoid blocking.

    Returns:
        Path: Path to the created log file.
    """
    return await asyncio.to_thread(self.save_log)