-
Notifications
You must be signed in to change notification settings - Fork 1.3k
NUTCH-3130 Address deprecated API usage across Nutch codebase and build #869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER); | ||
| for (int i = 0; i < normalized.length && value == null; i++) { | ||
| if (StringUtils.getLevenshteinDistance(searched, normalized[i]) < threshold) { | ||
| if (StringUtils.compareIgnoreCase(searched, normalized[i]) < threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to investigate whether this is a suitable replacement and also need to remove thie comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpellCheckedMetadata is used only by protocol-http and protocol-httpclient. We could deprecate it, use CaseInsensitiveMetadata instead (see NUTCH-3002) and later remove the class SpellCheckedMetadata entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use LevenshteinDistance.
|
|
||
| @Override | ||
| protected void finalize() throws Throwable { | ||
| shutDown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can simply remove the call to shutdown. I need to further investigate options and confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for me.
|
|
||
| @Override | ||
| protected void finalize() { | ||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure on this right now. I need to do some more research.
|
I did want to also note, there are 15 instances of the @deprecated annotation across 9 files in the codebase: I'm not sure whether is the correct time to be removing these methods but if we decide to keep them we should definitely augment the Javadoc to detail the replacement API usage. |
sebastian-nagel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks, @lewismc!
Some work still to be done. Let me know whether I shall take over.
| int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER); | ||
| for (int i = 0; i < normalized.length && value == null; i++) { | ||
| if (StringUtils.getLevenshteinDistance(searched, normalized[i]) < threshold) { | ||
| if (StringUtils.compareIgnoreCase(searched, normalized[i]) < threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpellCheckedMetadata is used only by protocol-http and protocol-httpclient. We could deprecate it, use CaseInsensitiveMetadata instead (see NUTCH-3002) and later remove the class SpellCheckedMetadata entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use LevenshteinDistance.
|
|
||
| @Override | ||
| protected void finalize() throws Throwable { | ||
| shutDown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for me.
| installExtensions(this.fRegisteredPlugins); | ||
| } catch (PluginRuntimeException e) { | ||
| LOG.error("Could not install extensions.", e.toString()); | ||
| LOG.error("Could not install extensions. {}", e.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Or: LOG.error("Could not install extensions:", e);
|
I didn't see you had responded here @sebastian-nagel . I will revisit this PR, update it and perform some more testing. |
Proposed fix for https://issues.apache.org/jira/browse/NUTCH-3130
This is a fairly straightforward PR. Some grunt work involved...
I took a bit of time to investigate issues with and options for removing instances of
@override finalize()... this may need more investigation and certainly some testing to make sure this PR doesn't break any behavior.