About Data

Hello, Thanks for your great work.  I have a question regarding the SFT training data.

I noticed that a complete “think” sequence is split across different timestamps, and I’m curious about the criterion used for this segmentation. For example:

```json
{"think": "After user query, confirmed driver’s right hand is at the bottom of the wheel. Vehicle", "timestamp": 10.5},
{"think": "still moving steadily. Isolation persists—no other signs of life or traffic in this desert journey.", "timestamp": 11.5}
```

Here, the sentence is split in the middle (“Vehicle” → “still moving steadily”). Could you clarify how these boundaries are determined? Is the segmentation based on fixed temporal windows, token length, streaming chunks, or some other strategy?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Data #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About Data #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions