gmail: Use Label Get() in order to get better progress estimate.#6
gmail: Use Label Get() in order to get better progress estimate.#6canet-benoit wants to merge 1 commit into
Conversation
The ResultSizeEstimate field is only an estimate.
This leads the backup progress computation to use a wrong total
account of mails hence displaying bogus results for non small
mailboxes.
This commit creates a method to be used to compute something
nearing the mail count (the label count) in order to compute
a better backup progress estimation.
This works by retrieving the Labels() then iterating on the
resulting list and filtering only by the label names that
make sense.
The two cases this patch implements are:
1 ) No label is to be used for filtering:
Then we take into account labels in the following set:
{ "DRAFT" , "INBOX", "SENT", "TRASH" }.
The rationale for this is that these labels fit 'real' mails
contrary to the other labels.
Note that account of mail labelled "TRASH" must be substracted
from the account computed above.
2 ) A label was specified in order to make a selective backup:
Then simply account the number of mails having this label.
Finally the messages sent to feed the progress display goroutine
takes into account the relevant label count of each mails if we
are into the 1) case.
Again relevant labels are in { "DRAFT" , "INBOX", "SENT", "TRASH" }.
39f646b to
c189daa
Compare
|
Hi, I have further patch series to write: fixing the Do you plan to review this series or should I fork ? Best regards Benoît |
|
I’ll try to get to this. I had a few suggested rewrites for this patch,
which held me up reviewing. Thanks for your patience. :)
…On Thu, 9 Mar 2023 at 15:06, canet-benoit ***@***.***> wrote:
Hi,
I have further patch series to write: fixing the
back-off timing strategy for example.
Do you plan to review this series or should I fork ?
Best regards
Benoît
—
Reply to this email directly, view it on GitHub
<#6 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOW7C4DOFQT3SEUETA76X3W3HPWPANCNFSM6AAAAAAU6ECZ2Y>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
danmarg
left a comment
There was a problem hiding this comment.
I left a few comments/questions, but at a high level, I think I'm confused by the change. Is the main goal to show progress as "number of labels fully synced" vs "number of messages"?
| return int(shard) | ||
| } | ||
|
|
||
| // This function return true if the given label parameter |
There was a problem hiding this comment.
| // This function return true if the given label parameter | |
| // This function returns true if the given label parameter |
|
|
||
| // This function return true if the given label parameter | ||
| // is not an extra (decorating) label. | ||
| func isToBeAccounted(label string) bool { |
There was a problem hiding this comment.
I think it might be clearer to just declare a "const" (not really, since maps cannot be const) map at the top of this file and directly check for membership, a la:
var countableLabels = map[string]bool{
"DRAFT": true,
"INBOX": true,
"SENT": true,
"TRASH": true,
}
And then for usage, replace the function call with countableLabels[label]. I think the function indirection doesn't buy us much.
| // Only these mails types are to take into account when | ||
| // retrieving a full mailbox without filtering by a given | ||
| // label. | ||
| type void struct{} |
There was a problem hiding this comment.
As I noted above, I think I'd just declare a map and access it directly. In addition, given the small size of this map, the optimization of using empty struct members seems to me not worth the slight loss in readability.
In any event, this function could be rewritten as:
| type void struct{} | |
| func isCounted(label string) bool { | |
| _, r := map[string]struct{}{ | |
| "INBOX": struct{}{}, | |
| "DRAFT": struct{}{}, | |
| "SENT": struct{}{}, | |
| "TRASH": struct{}{}}[label] | |
| return r | |
| } |
| } | ||
|
|
||
| // This function takes a list of labels and count | ||
| // the number of real labels discarding extra labels. |
There was a problem hiding this comment.
| // the number of real labels discarding extra labels. | |
| // the number of countable labels discarding extra labels. |
| return count | ||
| } | ||
|
|
||
| func getLabelsCount(g *Gmail) (int64, error) { |
There was a problem hiding this comment.
I'm afraid I don't understand the algorithm here. It seems like we are doing two things
- Getting the message count per labels for which we will download messages (which is either all "toBeAccounted" labels, or the label specified on the command line?).
- Getting the total count for all toBeAccounted labels.
And then displaying progress as "(1) divided by (2)".
I don't understand how this is correct? Isn't (1) the total number of matching messages?
Given the code that is updating the progress meter knows its current download counts, I thought the only error in displaying progress was that the denominator (i.e. returned from (2)) was wrong (prior to this PR).
|
Also, why wouldn't we just use https://developers.google.com/gmail/api/reference/rest/v1/users/getProfile to get the total message count for the given profile, if we want the total count of messages? What confuses me, though, is I don't think we do want the total message count--we want the total unsynced message count. If we switch the progress to use
What do you think? Does that make sense? Edit: To expand on this a bit, I think there are the following two design options:
Currently we are doing (1), but it has inaccuracy since it uses the "estimated" count for (2). We could switch to (2)--"message" progress--and get a fully accurate number of total messages on the server via the method I linked above. But, ironically, we don't currently know the total number of local (already synced) messages, so we'd have to get that either by a) storing it locally somewhere, or b) computing it via a prefix scan on the boltDB. And of course, (2) would also be a bit inaccurate, since it would be missing things like label changes and message deletes--thus, syncing (e.g.) delete ops would be omitted from the progress bar. I'm sorta fine with (2). But I think you can simplify this, in that case, by simply a) querying the total number of messages on the server, b) querying the total number locally via a cache prefix scan, and c) showing progress as "number of message add ops" / (total on the server - total on the client). How's that sound? |
|
Hi, Thanks a lot for the in depth review. Using the user GetProfile() makes a lot more sense than my label accounting. I will think about this in background during this week while I am on another task: Any way I will get back to path Outtake later. Best regards Benoît |
The ResultSizeEstimate field is only an estimate.
This leads the backup progress computation to use a wrong total account of mails hence displaying bogus results for non small mailboxes.
This commit creates a method to be used to compute something nearing the mail count (the label count) in order to compute a better backup progress estimation.
This works by retrieving the Labels() then iterating on the resulting list and filtering only by the label names that make sense.
The two cases this patch implements are:
1 ) No label is to be used for filtering:
Then we take into account labels in the following set: { "DRAFT" , "INBOX", "SENT", "TRASH" }.
The rationale for this is that these labels fit 'real' mails contrary to the other labels.
Note that account of mail labelled "TRASH" must be substracted from the account computed above.
2 ) A label was specified in order to make a selective backup:
Then simply account the number of mails having this label.
Finally the messages sent to feed the progress display goroutine takes into account the relevant label count of each mails if we are into the 1) case.
Again relevant labels are in { "DRAFT" , "INBOX", "SENT", "TRASH" }.