Skip to content

Data populated ratio#330

Merged
mathieudpnt merged 23 commits intoProject-OSmOSE:mainfrom
Gautzilla:data-fill-rate
Feb 23, 2026
Merged

Data populated ratio#330
mathieudpnt merged 23 commits intoProject-OSmOSE:mainfrom
Gautzilla:data-fill-rate

Conversation

@Gautzilla
Copy link
Contributor

@Gautzilla Gautzilla commented Jan 26, 2026

🐋 What's new?

This PR adds some stuff to the API to help the user filtering out data:

Property/Method name Description Done
BaseData.populated_duration Return the total duration of non-empty items
BaseData.populated_ratio Return the ratio between the duration of non-empty items and the duration of the data
SpectroData.populated_duration Return the total duration of non-empty items if the SpectroData has associated files, otherwise the populated_duration of the linked AudioData
BaseDataset.remove_data_on_populated_ratio(threshold=0.) Remove all data that have a populated ratio lower than threshold

🐬Note

This PR resove #295

@Gautzilla Gautzilla requested a review from mathieudpnt January 26, 2026 15:34
@Gautzilla Gautzilla self-assigned this Jan 26, 2026
@Gautzilla Gautzilla changed the title [DRAFT] Data populated ratio Data populated ratio Jan 27, 2026
@Gautzilla Gautzilla marked this pull request as ready for review January 27, 2026 14:59
@Gautzilla
Copy link
Contributor Author

@mathieudpnt
image

Copy link
Contributor

@mathieudpnt mathieudpnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but as we discussed I would like some possibility to trace the removed data
if remove_empty_data method returns the dismissed data that would be all good 👌

@Gautzilla
Copy link
Contributor Author

if remove_empty_data method returns the dismissed data that would be all good 👌

my-goodness-what-an-idea

@Gautzilla
Copy link
Contributor Author

LGTM but as we discussed I would like some possibility to trace the removed data if remove_empty_data method returns the dismissed data that would be all good 👌

Done, you can use it like so:

# Filter the returned AudioDataset
removed_data = audio_dataset.remove_empty_data(threshold=0.0)

# Ttake a look at which data has been removed:
print(f"{' REMOVED DATA ':#^60}")
print(f"{'Begin':<20}{'Duration':^20}{'Fill rate':>20}")
for data in removed_data:
    print(
        f"{data.begin.strftime('%H:%M:%S'):<20}{str(data.duration):^20}{str(data.populated_ratio) + ' %':>20}"
    )

"""
prints:

####################### REMOVED DATA #######################
Begin                     Duration                 Fill rate
22:35:41              0 days 00:00:05                  0.0 %
22:35:45              0 days 00:00:05                  0.0 %
22:35:48              0 days 00:00:05                  0.0 %
22:35:52              0 days 00:00:05                  0.0 %
"""

@mathieudpnt mathieudpnt merged commit e272be4 into Project-OSmOSE:main Feb 23, 2026
1 check passed
@Gautzilla Gautzilla deleted the data-fill-rate branch February 23, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants