Commit db2859e
committed
Give Wtp
See wiktectract issue #1604
tatuylonen/wiktextract#1604
https://en.wikipedia.org/wiki/Help:Wikitext#Blend_link
This adds a new attribute to Wtp that contains a `re.Pattern`
object used for pattern-matching these kinds of suffixed links.
Modify `Wtp.linktrailing_re` to change the behavior based
on how the parsed Wikimedia project handles linktrailing.
English uses `[a-z]+`.
Our default implementation uses `\w+`, which should be fine
most of the time.
Languages without spaces seem to use the English `[a-z]+`,
which seems to make sense. `[[englishword]]KANJI` wouldn't
have the kanji characters be consumed, but `\w+` breaks this.linktrailing_re attribute1 parent ecb885e commit db2859e
3 files changed
Lines changed: 16 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
| 285 | + | |
285 | 286 | | |
286 | 287 | | |
287 | 288 | | |
| |||
355 | 356 | | |
356 | 357 | | |
357 | 358 | | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
358 | 367 | | |
359 | 368 | | |
360 | 369 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1029 | 1029 | | |
1030 | 1030 | | |
1031 | 1031 | | |
1032 | | - | |
| 1032 | + | |
1033 | 1033 | | |
1034 | 1034 | | |
1035 | 1035 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
1111 | 1112 | | |
1112 | 1113 | | |
1113 | 1114 | | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
1114 | 1119 | | |
| 1120 | + | |
1115 | 1121 | | |
1116 | 1122 | | |
1117 | 1123 | | |
| |||
0 commit comments