Skip to content

gmail: Use Label Get() in order to get better progress estimate.#6

Open
canet-benoit wants to merge 1 commit into
danmarg:masterfrom
canet-benoit:improve_backup_progress_computation
Open

gmail: Use Label Get() in order to get better progress estimate.#6
canet-benoit wants to merge 1 commit into
danmarg:masterfrom
canet-benoit:improve_backup_progress_computation

Conversation

@canet-benoit

Copy link
Copy Markdown
Contributor

The ResultSizeEstimate field is only an estimate.

This leads the backup progress computation to use a wrong total account of mails hence displaying bogus results for non small mailboxes.

This commit creates a method to be used to compute something nearing the mail count (the label count) in order to compute a better backup progress estimation.

This works by retrieving the Labels() then iterating on the resulting list and filtering only by the label names that make sense.

The two cases this patch implements are:

1 ) No label is to be used for filtering:

Then we take into account labels in the following set: { "DRAFT" , "INBOX", "SENT", "TRASH" }.

The rationale for this is that these labels fit 'real' mails contrary to the other labels.

Note that account of mail labelled "TRASH" must be substracted from the account computed above.

2 ) A label was specified in order to make a selective backup:

Then simply account the number of mails having this label.

Finally the messages sent to feed the progress display goroutine takes into account the relevant label count of each mails if we are into the 1) case.
Again relevant labels are in { "DRAFT" , "INBOX", "SENT", "TRASH" }.

The ResultSizeEstimate field is only an estimate.

This leads the backup progress computation to use a wrong total
account of mails hence displaying bogus results for non small
mailboxes.

This commit creates a method to be used to compute something
nearing the mail count (the label count) in order to compute
a better backup progress estimation.

This works by retrieving the Labels() then iterating on the
resulting list and filtering only by the label names that
make sense.

The two cases this patch implements are:

1 ) No label is to be used for filtering:

Then we take into account labels in the following set:
{ "DRAFT" , "INBOX", "SENT", "TRASH" }.

The rationale for this is that these labels fit 'real' mails
contrary to the other labels.

Note that account of mail labelled "TRASH" must be substracted
from the account computed above.

2 ) A label was specified in order to make a selective backup:

Then simply account the number of mails having this label.

Finally the messages sent to feed the progress display goroutine
takes into account the relevant label count of each mails if we
are into the 1) case.
Again relevant labels are in { "DRAFT" , "INBOX", "SENT", "TRASH" }.
@canet-benoit canet-benoit force-pushed the improve_backup_progress_computation branch from 39f646b to c189daa Compare February 16, 2023 13:18
@canet-benoit

Copy link
Copy Markdown
Contributor Author

Hi,

I have further patch series to write: fixing the
back-off timing strategy for example.

Do you plan to review this series or should I fork ?

Best regards

Benoît

@danmarg

danmarg commented Mar 9, 2023 via email

Copy link
Copy Markdown
Owner

@danmarg danmarg left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments/questions, but at a high level, I think I'm confused by the change. Is the main goal to show progress as "number of labels fully synced" vs "number of messages"?

Comment thread lib/gmail/gmail.go
return int(shard)
}

// This function return true if the given label parameter

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This function return true if the given label parameter
// This function returns true if the given label parameter

Comment thread lib/gmail/gmail.go

// This function return true if the given label parameter
// is not an extra (decorating) label.
func isToBeAccounted(label string) bool {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be clearer to just declare a "const" (not really, since maps cannot be const) map at the top of this file and directly check for membership, a la:

var countableLabels = map[string]bool{
  "DRAFT": true,
  "INBOX": true,
  "SENT": true,
  "TRASH": true,
}

And then for usage, replace the function call with countableLabels[label]. I think the function indirection doesn't buy us much.

Comment thread lib/gmail/gmail.go
// Only these mails types are to take into account when
// retrieving a full mailbox without filtering by a given
// label.
type void struct{}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted above, I think I'd just declare a map and access it directly. In addition, given the small size of this map, the optimization of using empty struct members seems to me not worth the slight loss in readability.

In any event, this function could be rewritten as:

Suggested change
type void struct{}
func isCounted(label string) bool {
_, r := map[string]struct{}{
"INBOX": struct{}{},
"DRAFT": struct{}{},
"SENT": struct{}{},
"TRASH": struct{}{}}[label]
return r
}

Comment thread lib/gmail/gmail.go
}

// This function takes a list of labels and count
// the number of real labels discarding extra labels.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the number of real labels discarding extra labels.
// the number of countable labels discarding extra labels.

Comment thread lib/gmail/gmail.go
return count
}

func getLabelsCount(g *Gmail) (int64, error) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't understand the algorithm here. It seems like we are doing two things

  1. Getting the message count per labels for which we will download messages (which is either all "toBeAccounted" labels, or the label specified on the command line?).
  2. Getting the total count for all toBeAccounted labels.

And then displaying progress as "(1) divided by (2)".

I don't understand how this is correct? Isn't (1) the total number of matching messages?

Given the code that is updating the progress meter knows its current download counts, I thought the only error in displaying progress was that the denominator (i.e. returned from (2)) was wrong (prior to this PR).

Comment thread main.go
@danmarg

danmarg commented Mar 19, 2023

Copy link
Copy Markdown
Owner

Also, why wouldn't we just use https://developers.google.com/gmail/api/reference/rest/v1/users/getProfile to get the total message count for the given profile, if we want the total count of messages?

What confuses me, though, is I don't think we do want the total message count--we want the total unsynced message count. If we switch the progress to use messagesTotal from https://developers.google.com/gmail/api/reference/rest/v1/users/getProfile, we could do something like:

  • Count how many messages are already synced to the local mailbox
  • Count how many messages exist in the profile
  • Show progress as "new message" event counts (in the numerator) and "profile - local" in the denominator

What do you think? Does that make sense?

Edit: To expand on this a bit, I think there are the following two design options:

  1. Show progress as "number of operations completed out of the total"
  2. Show progress as "number of messages synced out of the total"

Currently we are doing (1), but it has inaccuracy since it uses the "estimated" count for (2).

We could switch to (2)--"message" progress--and get a fully accurate number of total messages on the server via the method I linked above. But, ironically, we don't currently know the total number of local (already synced) messages, so we'd have to get that either by a) storing it locally somewhere, or b) computing it via a prefix scan on the boltDB. And of course, (2) would also be a bit inaccurate, since it would be missing things like label changes and message deletes--thus, syncing (e.g.) delete ops would be omitted from the progress bar.

I'm sorta fine with (2). But I think you can simplify this, in that case, by simply a) querying the total number of messages on the server, b) querying the total number locally via a cache prefix scan, and c) showing progress as "number of message add ops" / (total on the server - total on the client).

How's that sound?

@canet-benoit

Copy link
Copy Markdown
Contributor Author

Hi,

Thanks a lot for the in depth review.

Using the user GetProfile() makes a lot more sense than my label accounting.

I will think about this in background during this week while I am on another task:
fomenting a migration toward Chrome Devices.

Any way I will get back to path Outtake later.

Best regards

Benoît

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants