Hi all ,
I’ve implemented an approach of encoding unsafe characters in the cache names for persistent storage directories. You can find it at https://github.com/gridgain/apache-ignite/tree/ignite-7264. How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class are replaced with their hex value (seems to be the easiest way); 2) a hash of the cache name is added at the end of the name to avoid case-insensitive collisions. There is still a tiny chance of hitting two cache names that are equal ignoring case which also have the same hash, but that’s really unlikely. It seems that there are no complications with this approach. The cache name to directory mapping is like mycache -> cache-mycache-f19fd83d my/cool/cache -> cache-my2fcool2fcache my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 Turns out the persistence is not the only place that doesn’t like special symbols in cache names – I also got an exception from MBean registration when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/jira/browse/IGNITE-7334 for that. Please let me know if you have any comments. Thanks, Stan From: Stanislav Lukyanov Sent: 25 декабря 2017 г. 18:09 To: [hidden email] Subject: Handling slashes in cache names Hi all, I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I need some guidance on what’s the best way to approach it. The problem is that cache names are not restricted, but if persistence is enabled the cache needs to have a corresponding directory on the file system (“cache-…”) which can’t be created if the cache name contains certain characters (or a reserved system name). A straightforward approach would be to check if a cache name is allowed on the local system (e.g. via `Paths.get(name)`) and fail to create cache if it isn’t, but I’m a bit concerned with the consistency of the behavior (the same cache name be allowed on one system and not on another). I think a better way would be to replace special characters (say, all non-alphanumeric characters) with underscores in file names (not changing the cache configuration). Would this be OK? Are there any risks I’m not considering? WDYT? Thanks, Stan |
On Thu, Dec 28, 2017 at 9:22 AM, Stanislav Lukyanov <[hidden email]>
wrote: > Hi all , > > I’ve implemented an approach of encoding unsafe characters in the cache > names for persistent storage directories. You can find it at > https://github.com/gridgain/apache-ignite/tree/ignite-7264. > How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class > are replaced with their hex value (seems to be the easiest way); I would surround such replacements with "_", e.g. "myCacheName_somesymbol_". > 2) a hash of the cache name is added at the end of the name to avoid > case-insensitive collisions. > There is still a tiny chance of hitting two cache names that are equal > ignoring case which also have the same hash, but that’s really unlikely. > Here I am confused. I think the cache names should be case insensitive at all times. I seriously doubt enforcing this rule would cause problems. If we enforce this rule at cache creation time, then we would not have to add a hashcode at the end. > > It seems that there are no complications with this approach. > The cache name to directory mapping is like > mycache -> cache-mycache-f19fd83d > my/cool/cache -> cache-my2fcool2fcache > As mentioned above, I would prefer "cache-my_2f_cool_2f_cache" > my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 > > Turns out the persistence is not the only place that doesn’t like special > symbols in cache names – I also got an exception from MBean registration > when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/ > jira/browse/IGNITE-7334 for that. > > Please let me know if you have any comments. > > Thanks, > Stan > > From: Stanislav Lukyanov > Sent: 25 декабря 2017 г. 18:09 > To: [hidden email] > Subject: Handling slashes in cache names > > Hi all, > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I > need some guidance on what’s the best way to approach it. > > The problem is that cache names are not restricted, but if persistence is > enabled the cache needs to have a corresponding directory on the file > system (“cache-…”) which can’t be created if the cache name contains > certain characters (or a reserved system name). > > A straightforward approach would be to check if a cache name is allowed on > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > it isn’t, but I’m a bit concerned with the consistency of the behavior (the > same cache name be allowed on one system and not on another). > I think a better way would be to replace special characters (say, all > non-alphanumeric characters) with underscores in file names (not changing > the cache configuration). Would this be OK? Are there any risks I’m not > considering? > > WDYT? > > Thanks, > Stan > > |
> I would surround such replacements with "_", e.g. "myCacheName_somesymbol_".
Looks nice, will do. > Here I am confused. I think the cache names should be case insensitive at > all times. I seriously doubt enforcing this rule would cause problems. If > we enforce this rule at cache creation time, then we would not have to add > a hashcode at the end. I think I would still keep the hashcode. E.g. I’m now also truncating names longer than 255 chars, and the truncated names could be equal. There could be more edge cases, and adding an imprint of the identity might help to avoid them. The names are readable enough with the hashes, but scary enough for users not to mess with them manually – I guess that’s a good thing :) Making cache names always case-insensitive sounds good, but I’d separate it to another JIRA issue (it has larger compatibility impact, it affects a different part of the code base, etc). Is it OK? Thanks, Stan From: Dmitriy Setrakyan Sent: 28 декабря 2017 г. 22:33 To: [hidden email] Subject: Re: Handling slashes in cache names On Thu, Dec 28, 2017 at 9:22 AM, Stanislav Lukyanov <[hidden email]> wrote: > Hi all , > > I’ve implemented an approach of encoding unsafe characters in the cache > names for persistent storage directories. You can find it at > https://github.com/gridgain/apache-ignite/tree/ignite-7264. > How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class > are replaced with their hex value (seems to be the easiest way); I would surround such replacements with "_", e.g. "myCacheName_somesymbol_". > 2) a hash of the cache name is added at the end of the name to avoid > case-insensitive collisions. > There is still a tiny chance of hitting two cache names that are equal > ignoring case which also have the same hash, but that’s really unlikely. > Here I am confused. I think the cache names should be case insensitive at all times. I seriously doubt enforcing this rule would cause problems. If we enforce this rule at cache creation time, then we would not have to add a hashcode at the end. > > It seems that there are no complications with this approach. > The cache name to directory mapping is like > mycache -> cache-mycache-f19fd83d > my/cool/cache -> cache-my2fcool2fcache > As mentioned above, I would prefer "cache-my_2f_cool_2f_cache" > my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 > > Turns out the persistence is not the only place that doesn’t like special > symbols in cache names – I also got an exception from MBean registration > when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/ > jira/browse/IGNITE-7334 for that. > > Please let me know if you have any comments. > > Thanks, > Stan > > From: Stanislav Lukyanov > Sent: 25 декабря 2017 г. 18:09 > To: [hidden email] > Subject: Handling slashes in cache names > > Hi all, > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I > need some guidance on what’s the best way to approach it. > > The problem is that cache names are not restricted, but if persistence is > enabled the cache needs to have a corresponding directory on the file > system (“cache-…”) which can’t be created if the cache name contains > certain characters (or a reserved system name). > > A straightforward approach would be to check if a cache name is allowed on > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > it isn’t, but I’m a bit concerned with the consistency of the behavior (the > same cache name be allowed on one system and not on another). > I think a better way would be to replace special characters (say, all > non-alphanumeric characters) with underscores in file names (not changing > the cache configuration). Would this be OK? Are there any risks I’m not > considering? > > WDYT? > > Thanks, > Stan > > |
On Fri, Dec 29, 2017 at 2:28 AM, Stanislav Lukyanov <[hidden email]>
wrote: > > I would surround such replacements with "_", e.g. > "myCacheName_somesymbol_". > Looks nice, will do. > > > Here I am confused. I think the cache names should be case insensitive at > > all times. I seriously doubt enforcing this rule would cause problems. If > > we enforce this rule at cache creation time, then we would not have to > add > > a hashcode at the end. > I think I would still keep the hashcode. E.g. I’m now also truncating > names longer than 255 chars, and the truncated names could be equal. There > could be more edge cases, and adding an imprint of the identity might help > to avoid them. The names are readable enough with the hashes, but scary > enough for users not to mess with them manually – I guess that’s a good > thing :) Making cache names always case-insensitive sounds good, but I’d separate it > to another JIRA issue (it has larger compatibility impact, it affects a > different part of the code base, etc). Is it OK? > Well, having to support multiple cache name formats going forward will be difficult. I would rather we finalize on it right now. My preference would be to limit to 255 characters right now and make cache names case insensitive. I doubt such change would affect many users, but it would definitely make things cleaner. Would be nice to here what others in the community think. Vladimir O., Alexey G.? D. |
Let me return back to this issue.
> Well, having to support multiple cache name formats going forward will be > difficult. I don’t think there is a question of multiple name formats. Let’s just say that there are issues that can be solved on the base cache level (e.g. making cache names always case-insensitive) and there are issues that have to be solved by the PDS (e.g. special and non-ASCII symbols that we don’t want to always ban from names). I’m not suggesting to introduce anything to PDS that will afterwards be handled by the base cache code. We’ll just handle some issues first, in PDS, and other issues will be handled separately. > My preference would be to limit to 255 characters right now That would be good, but it doesn’t really solve the issue with the length. Since non-ASCII characters (and non-alphanumeric ASCII) are encoded, the actual length of a cache’s directory name may be greater than the name of the cache (and don’t forget the “cache-“ prefix). We could come up with a “really safe” limit, but it might be too small (around 80?), and that would be limiting the API based on a rather arbitrary Implementation detail. Another reason why I like to have a hash in the file name is that we might run into problems with two names, one of which is an escaped version of the other, like “my/cache” and “my_2f_cache”. And I guess there can be more similar collisions that we just don’t think of right now. Having a hash in the name just works as a (probabilistic) failsafe for that. Thanks, Stan From: Dmitriy Setrakyan Sent: 2 января 2018 г. 16:40 To: [hidden email] Subject: Re: Handling slashes in cache names On Fri, Dec 29, 2017 at 2:28 AM, Stanislav Lukyanov <[hidden email]> wrote: > > I would surround such replacements with "_", e.g. > "myCacheName_somesymbol_". > Looks nice, will do. > > > Here I am confused. I think the cache names should be case insensitive at > > all times. I seriously doubt enforcing this rule would cause problems. If > > we enforce this rule at cache creation time, then we would not have to > add > > a hashcode at the end. > I think I would still keep the hashcode. E.g. I’m now also truncating > names longer than 255 chars, and the truncated names could be equal. There > could be more edge cases, and adding an imprint of the identity might help > to avoid them. The names are readable enough with the hashes, but scary > enough for users not to mess with them manually – I guess that’s a good > thing :) Making cache names always case-insensitive sounds good, but I’d separate it > to another JIRA issue (it has larger compatibility impact, it affects a > different part of the code base, etc). Is it OK? > Well, having to support multiple cache name formats going forward will be difficult. I would rather we finalize on it right now. My preference would be to limit to 255 characters right now and make cache names case insensitive. I doubt such change would affect many users, but it would definitely make things cleaner. Would be nice to here what others in the community think. Vladimir O., Alexey G.? D. |
Agree that cache names should be case insensitive - currently it seems that
we have issues on Windows OS. As far as allowed characters - why don't we try creating a directory on all nodes (but calling toLower() prior to creation)? If creation succeeds everywhere then cache name is acceptable. New nodes should throw exception if folder creation is impossible. I don't like escaping since it will not add any usability for, let's say, Chinese or Russian names. For example, MySQL supports ASCII: [0-9,a-z,A-Z$_] (basic Latin letters, digits 0-9, dollar, underscore) and Extended: U+0080 .. U+FFFF [1] I also would think over some intersection of allowed file name characters in different file systems [2] [1] https://dev.mysql.com/doc/refman/5.7/en/identifiers.html [2] https://en.wikipedia.org/wiki/Filename Yakov Zhdanov |
> try creating a directory on all nodes
And then a new node appears with a different kind of file system.. Escaping removes all limitations and does not affect usability. Pavel On Mon, Jan 15, 2018 at 5:47 PM, Yakov Zhdanov <[hidden email]> wrote: > Agree that cache names should be case insensitive - currently it seems that > we have issues on Windows OS. > > As far as allowed characters - why don't we try creating a directory on all > nodes (but calling toLower() prior to creation)? If creation succeeds > everywhere then cache name is acceptable. New nodes should throw exception > if folder creation is impossible. > > I don't like escaping since it will not add any usability for, let's say, > Chinese or Russian names. For example, MySQL supports ASCII: > [0-9,a-z,A-Z$_] (basic Latin letters, digits 0-9, dollar, underscore) and > Extended: U+0080 .. U+FFFF [1] > > I also would think over some intersection of allowed file name characters > in different file systems [2] > > [1] https://dev.mysql.com/doc/refman/5.7/en/identifiers.html > [2] https://en.wikipedia.org/wiki/Filename > > Yakov Zhdanov > |
>> And then a new node appears with a different kind of file system..
This is hardly possible. And I suggest not to >> Escaping removes all limitations and does not affect usability. Disagree. You will never ever relate smth like "fdee0456adcc" to "мои_данные". Guys, I just realized that we create folder for cache group. How about we choose group ID for folder name and put text file cachegroup.info containing group name to it? --Yakov |
> You will never ever relate smth like "fdee0456adcc" to "мои_данные".
As a user, why do I need to understand file names in Ignite work directory? On Mon, Jan 15, 2018 at 6:22 PM, Yakov Zhdanov <[hidden email]> wrote: > >> And then a new node appears with a different kind of file system.. > This is hardly possible. And I suggest not to > > >> Escaping removes all limitations and does not affect usability. > Disagree. You will never ever relate smth like "fdee0456adcc" to > "мои_данные". > > Guys, I just realized that we create folder for cache group. How about we > choose group ID for folder name and put text file cachegroup.info > containing group name to it? > > --Yakov > |
To understand how much storage you need for cache group "X" and watch the
trends. Anyway, folder named by ID and txt file inside should do the trick =) --Yakov |
In reply to this post by Pavel Tupitsyn
On Mon, Jan 15, 2018 at 7:11 AM, Pavel Tupitsyn <[hidden email]>
wrote: > > try creating a directory on all nodes > And then a new node appears with a different kind of file system.. > If a new node cannot create an existing cache, it should not be allowed to start. |
In reply to this post by Pavel Tupitsyn
On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsyn <[hidden email]>
wrote: > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > As a user, why do I need to understand file names in Ignite work directory? > Because it is better to have an understandable and human readable directory structure than not. Let's do it right. |
> folder named by ID and txt file inside should do the trick
Agree On Tue, Jan 16, 2018 at 1:02 PM, Dmitriy Setrakyan <[hidden email]> wrote: > On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsyn <[hidden email]> > wrote: > > > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > > > As a user, why do I need to understand file names in Ignite work > directory? > > > > Because it is better to have an understandable and human readable directory > structure than not. Let's do it right. > |
How about using both escaping and a text file with the name?
One can think of the escaped name as of a kind of ID, which happens to be human-readable when the name is in ASCII, and as unreadable as an UUID when the name is in UTF. This way we have all the readability in the common case (when name is all English letters and digits), and some limited readability (via looking into text files) when other alphabets are used. Thanks, Stan From: Pavel Tupitsyn Sent: 16 января 2018 г. 14:01 To: [hidden email] Subject: Re: Handling slashes in cache names > folder named by ID and txt file inside should do the trick Agree On Tue, Jan 16, 2018 at 1:02 PM, Dmitriy Setrakyan <[hidden email]> wrote: > On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsyn <[hidden email]> > wrote: > > > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > > > As a user, why do I need to understand file names in Ignite work > directory? > > > > Because it is better to have an understandable and human readable directory > structure than not. Let's do it right. > |
>> How about using both escaping and a text file with the name? One can
think of the escaped name as of a kind of ID, which happens to be human-readable when the name is in ASCII, and as unreadable as an UUID when the name is in UTF. This way we have all the readability in the common case (when name is all English letters and digits), and some limited readability (via looking into text files) when other alphabets are used. Sounds good to me. --Yakov |
Free forum by Nabble | Edit this page |