add letters regex to match for more non ASCII chars#51
Conversation
|
Hey @faxemaxee, glad to see you again. :D This is really a very tricky issue and that's why I took some time to analyze it. I have been wanting to do this, but wanted to avoid the handpicking of unicode characters to achieve this. This two broad issues with these are:
I was looking to use Unicode Property Escapes to achieve this result such that a regex like As of now I am concerned about the support of this in browsers and the performance. A quick test on regex101 seem promising but needs more effort. |
|
Hey @tarunbatra, I totally get your concern, I basically had the same. AFAIK I did some research and found this: https://github.com/slevithan/xregexp But I am fine with not accepting this PR when it is not meeting your standards regarding this issue. :) |
|
Hey @faxemaxee, sorry for late reply. The browser support doesn't look very bad to me as it seems every major browser except IE (surprise) and Firefox Android supports Unicode Property Escapes. I do agree with you that something is better than nothing but at this point I feel like I'd rather wait or use a library like the one you suggested. I understand that this feature is important for you and I'm sorry this couldn't be shipped. I appreciate the effort you have put in and hope you'll be able to use a fork of the library until we can ship it. :) |
Hey, it's me again! :D
I realised that the letters rule only checks for a very small amount of chars thus causing issues in our 14 language support product. Me, as a German cannot use Umlaute (
ä, ö, ü) or other special letters likeß. After debugging and reading up on regex I decided to make another PR included SOME but not all charsets I would like to see in the package. I also updated the readme with the necessary information about the supported chars.Regarding tests. First and foremost I wanted to make sure that the general behaviour did not change and added negative test for all the other rules. I then started to make complete checks for the other charsets (not included in the PR) and quickly realises that something like
letters(128)took several seconds to finish (we should consider capping the count value here, maybe 10 or so, everything above 5 sounds unreasonable to me, but who knows). And those 128 were only the greek and coptic alphabet. Not even starting with the CJK set... :DOkay, let me know what you think!