Handling slashes in cache names

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Handling slashes in cache names

Stanislav Lukyanov
Hi all,

I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I need some guidance on what’s the best way to approach it.

The problem is that cache names are not restricted, but if persistence is enabled the cache needs to have a corresponding directory on the file system (“cache-…”) which can’t be created if the cache name contains certain characters (or a reserved system name).

A straightforward approach would be to check if a cache name is allowed on the local system (e.g. via `Paths.get(name)`) and fail to create cache if it isn’t, but I’m a bit concerned with the consistency of the behavior (the same cache name be allowed on one system and not on another).
I think a better way would be to replace special characters (say, all non-alphanumeric characters) with underscores in file names (not changing the cache configuration). Would this be OK? Are there any risks I’m not considering?

WDYT?

Thanks,
Stan
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

dsetrakyan
My preference would be to prohibit forward and backward slashes in cache
names altogether, as they may create a false feeling of some directory
structure, which does not exist. We should also prohibit spaces as well.

D.

On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> Hi all,
>
> I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I
> need some guidance on what’s the best way to approach it.
>
> The problem is that cache names are not restricted, but if persistence is
> enabled the cache needs to have a corresponding directory on the file
> system (“cache-…”) which can’t be created if the cache name contains
> certain characters (or a reserved system name).
>
> A straightforward approach would be to check if a cache name is allowed on
> the local system (e.g. via `Paths.get(name)`) and fail to create cache if
> it isn’t, but I’m a bit concerned with the consistency of the behavior (the
> same cache name be allowed on one system and not on another).
> I think a better way would be to replace special characters (say, all
> non-alphanumeric characters) with underscores in file names (not changing
> the cache configuration). Would this be OK? Are there any risks I’m not
> considering?
>
> WDYT?
>
> Thanks,
> Stan
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Alexey Kuznetsov
It also make sense to limit cache name length to reasonable length.
Because some File systems could have limitations on path length.
See: https://en.wikipedia.org/wiki/Filename#Length_restrictions

On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> My preference would be to prohibit forward and backward slashes in cache
> names altogether, as they may create a false feeling of some directory
> structure, which does not exist. We should also prohibit spaces as well.
>
> D.
>
> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Hi all,
> >
> > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and
> I
> > need some guidance on what’s the best way to approach it.
> >
> > The problem is that cache names are not restricted, but if persistence is
> > enabled the cache needs to have a corresponding directory on the file
> > system (“cache-…”) which can’t be created if the cache name contains
> > certain characters (or a reserved system name).
> >
> > A straightforward approach would be to check if a cache name is allowed
> on
> > the local system (e.g. via `Paths.get(name)`) and fail to create cache if
> > it isn’t, but I’m a bit concerned with the consistency of the behavior
> (the
> > same cache name be allowed on one system and not on another).
> > I think a better way would be to replace special characters (say, all
> > non-alphanumeric characters) with underscores in file names (not changing
> > the cache configuration). Would this be OK? Are there any risks I’m not
> > considering?
> >
> > WDYT?
> >
> > Thanks,
> > Stan
> >
>



--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

RE: Handling slashes in cache names

Stanislav Lukyanov
Thanks for the feedback.

It seems that another thing to handle is case-insensitive FS – “mycache” and “MyCache” is the same on Windows, so it might be reasonable to disallow having two caches with names that are equal ignoring case.
And one more thing is control characters – forbidding at least range of ASCII 0x00-0x20 seems reasonable.

To summarize, a possible set of restrictions would be
- Whitespace characters (via Character.isWhitespaceCharacter)
- Control characters (via Character.isISOCharacter)
- Slashes
- Characters reserved in Windows (<>:"/\|?*)
- Length (say, up to 255)
- Distinct names of caches when ignoring case
It seems reasonable to enforce that even regardless of persistence directories naming (AFAIU that’s what Dmitry meant by forbidding things altogether), so that’s what I’m going to do.
Any concerns?
Specifically, would it be OK from backward compatibility point of view to forbid all these characters now for all caches?

Thanks,
Stan


From: Alexey Kuznetsov
Sent: 26 декабря 2017 г. 7:51
To: [hidden email]
Subject: Re: Handling slashes in cache names

It also make sense to limit cache name length to reasonable length.
Because some File systems could have limitations on path length.
See: https://en.wikipedia.org/wiki/Filename#Length_restrictions

On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> My preference would be to prohibit forward and backward slashes in cache
> names altogether, as they may create a false feeling of some directory
> structure, which does not exist. We should also prohibit spaces as well.
>
> D.
>
> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Hi all,
> >
> > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and
> I
> > need some guidance on what’s the best way to approach it.
> >
> > The problem is that cache names are not restricted, but if persistence is
> > enabled the cache needs to have a corresponding directory on the file
> > system (“cache-…”) which can’t be created if the cache name contains
> > certain characters (or a reserved system name).
> >
> > A straightforward approach would be to check if a cache name is allowed
> on
> > the local system (e.g. via `Paths.get(name)`) and fail to create cache if
> > it isn’t, but I’m a bit concerned with the consistency of the behavior
> (the
> > same cache name be allowed on one system and not on another).
> > I think a better way would be to replace special characters (say, all
> > non-alphanumeric characters) with underscores in file names (not changing
> > the cache configuration). Would this be OK? Are there any risks I’m not
> > considering?
> >
> > WDYT?
> >
> > Thanks,
> > Stan
> >
>



--
Alexey Kuznetsov

Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

dsetrakyan
Looks good to me. Is this going to be an exception on startup? If yes, is
it safe to release it, or should we wait till 3.0?

On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> Thanks for the feedback.
>
> It seems that another thing to handle is case-insensitive FS – “mycache”
> and “MyCache” is the same on Windows, so it might be reasonable to disallow
> having two caches with names that are equal ignoring case.
> And one more thing is control characters – forbidding at least range of
> ASCII 0x00-0x20 seems reasonable.
>
> To summarize, a possible set of restrictions would be
> - Whitespace characters (via Character.isWhitespaceCharacter)
> - Control characters (via Character.isISOCharacter)
> - Slashes
> - Characters reserved in Windows (<>:"/\|?*)
> - Length (say, up to 255)
> - Distinct names of caches when ignoring case
> It seems reasonable to enforce that even regardless of persistence
> directories naming (AFAIU that’s what Dmitry meant by forbidding things
> altogether), so that’s what I’m going to do.
> Any concerns?
> Specifically, would it be OK from backward compatibility point of view to
> forbid all these characters now for all caches?
>
> Thanks,
> Stan
>
>
> From: Alexey Kuznetsov
> Sent: 26 декабря 2017 г. 7:51
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> It also make sense to limit cache name length to reasonable length.
> Because some File systems could have limitations on path length.
> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
>
> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > My preference would be to prohibit forward and backward slashes in cache
> > names altogether, as they may create a false feeling of some directory
> > structure, which does not exist. We should also prohibit spaces as well.
> >
> > D.
> >
> > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> and
> > I
> > > need some guidance on what’s the best way to approach it.
> > >
> > > The problem is that cache names are not restricted, but if persistence
> is
> > > enabled the cache needs to have a corresponding directory on the file
> > > system (“cache-…”) which can’t be created if the cache name contains
> > > certain characters (or a reserved system name).
> > >
> > > A straightforward approach would be to check if a cache name is allowed
> > on
> > > the local system (e.g. via `Paths.get(name)`) and fail to create cache
> if
> > > it isn’t, but I’m a bit concerned with the consistency of the behavior
> > (the
> > > same cache name be allowed on one system and not on another).
> > > I think a better way would be to replace special characters (say, all
> > > non-alphanumeric characters) with underscores in file names (not
> changing
> > > the cache configuration). Would this be OK? Are there any risks I’m not
> > > considering?
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Stan
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Igor Sapego-2
There are also some international features that you might want to
address. For example, instead of backslash some other characters
may be used on Windows - ¥ on the Japanese version, ₩ on the
Korean version.
See [1] for more info.

Here is the citation:
Security Considerations for Character Sets in File Names
Windows code page and OEM character sets used on
Japanese-language systems contain the Yen symbol (¥) instead of
a backslash (\). Thus, the Yen character is a prohibited character for
NTFS and FAT file systems. When mapping Unicode to
a Japanese-language code page, conversion functions map both
backslash (U+005C) and the normal Unicode Yen symbol (U+00A5)
to this same character. For security reasons, your applications should
not typically allow the character U+00A5 in a Unicode string that
might be converted for use as a FAT file name.

[1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx


Best Regards,
Igor

On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Looks good to me. Is this going to be an exception on startup? If yes, is
> it safe to release it, or should we wait till 3.0?
>
> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Thanks for the feedback.
> >
> > It seems that another thing to handle is case-insensitive FS – “mycache”
> > and “MyCache” is the same on Windows, so it might be reasonable to
> disallow
> > having two caches with names that are equal ignoring case.
> > And one more thing is control characters – forbidding at least range of
> > ASCII 0x00-0x20 seems reasonable.
> >
> > To summarize, a possible set of restrictions would be
> > - Whitespace characters (via Character.isWhitespaceCharacter)
> > - Control characters (via Character.isISOCharacter)
> > - Slashes
> > - Characters reserved in Windows (<>:"/\|?*)
> > - Length (say, up to 255)
> > - Distinct names of caches when ignoring case
> > It seems reasonable to enforce that even regardless of persistence
> > directories naming (AFAIU that’s what Dmitry meant by forbidding things
> > altogether), so that’s what I’m going to do.
> > Any concerns?
> > Specifically, would it be OK from backward compatibility point of view to
> > forbid all these characters now for all caches?
> >
> > Thanks,
> > Stan
> >
> >
> > From: Alexey Kuznetsov
> > Sent: 26 декабря 2017 г. 7:51
> > To: [hidden email]
> > Subject: Re: Handling slashes in cache names
> >
> > It also make sense to limit cache name length to reasonable length.
> > Because some File systems could have limitations on path length.
> > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> >
> > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > My preference would be to prohibit forward and backward slashes in
> cache
> > > names altogether, as they may create a false feeling of some directory
> > > structure, which does not exist. We should also prohibit spaces as
> well.
> > >
> > > D.
> > >
> > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> > and
> > > I
> > > > need some guidance on what’s the best way to approach it.
> > > >
> > > > The problem is that cache names are not restricted, but if
> persistence
> > is
> > > > enabled the cache needs to have a corresponding directory on the file
> > > > system (“cache-…”) which can’t be created if the cache name contains
> > > > certain characters (or a reserved system name).
> > > >
> > > > A straightforward approach would be to check if a cache name is
> allowed
> > > on
> > > > the local system (e.g. via `Paths.get(name)`) and fail to create
> cache
> > if
> > > > it isn’t, but I’m a bit concerned with the consistency of the
> behavior
> > > (the
> > > > same cache name be allowed on one system and not on another).
> > > > I think a better way would be to replace special characters (say, all
> > > > non-alphanumeric characters) with underscores in file names (not
> > changing
> > > > the cache configuration). Would this be OK? Are there any risks I’m
> not
> > > > considering?
> > > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Handling slashes in cache names

Stanislav Lukyanov
In reply to this post by dsetrakyan
Well, that’s my question too :)
Do we have any compatibility guidelines or other documents on what can or cannot be in a minor/major release?

Also, it might be helpful to add an environment variable (like IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just in case.

Thanks,
Stan

From: Dmitriy Setrakyan
Sent: 26 декабря 2017 г. 17:02
To: [hidden email]
Subject: Re: Handling slashes in cache names

Looks good to me. Is this going to be an exception on startup? If yes, is
it safe to release it, or should we wait till 3.0?

On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> Thanks for the feedback.
>
> It seems that another thing to handle is case-insensitive FS – “mycache”
> and “MyCache” is the same on Windows, so it might be reasonable to disallow
> having two caches with names that are equal ignoring case.
> And one more thing is control characters – forbidding at least range of
> ASCII 0x00-0x20 seems reasonable.
>
> To summarize, a possible set of restrictions would be
> - Whitespace characters (via Character.isWhitespaceCharacter)
> - Control characters (via Character.isISOCharacter)
> - Slashes
> - Characters reserved in Windows (<>:"/\|?*)
> - Length (say, up to 255)
> - Distinct names of caches when ignoring case
> It seems reasonable to enforce that even regardless of persistence
> directories naming (AFAIU that’s what Dmitry meant by forbidding things
> altogether), so that’s what I’m going to do.
> Any concerns?
> Specifically, would it be OK from backward compatibility point of view to
> forbid all these characters now for all caches?
>
> Thanks,
> Stan
>
>
> From: Alexey Kuznetsov
> Sent: 26 декабря 2017 г. 7:51
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> It also make sense to limit cache name length to reasonable length.
> Because some File systems could have limitations on path length.
> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
>
> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > My preference would be to prohibit forward and backward slashes in cache
> > names altogether, as they may create a false feeling of some directory
> > structure, which does not exist. We should also prohibit spaces as well.
> >
> > D.
> >
> > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> and
> > I
> > > need some guidance on what’s the best way to approach it.
> > >
> > > The problem is that cache names are not restricted, but if persistence
> is
> > > enabled the cache needs to have a corresponding directory on the file
> > > system (“cache-…”) which can’t be created if the cache name contains
> > > certain characters (or a reserved system name).
> > >
> > > A straightforward approach would be to check if a cache name is allowed
> > on
> > > the local system (e.g. via `Paths.get(name)`) and fail to create cache
> if
> > > it isn’t, but I’m a bit concerned with the consistency of the behavior
> > (the
> > > same cache name be allowed on one system and not on another).
> > > I think a better way would be to replace special characters (say, all
> > > non-alphanumeric characters) with underscores in file names (not
> changing
> > > the cache configuration). Would this be OK? Are there any risks I’m not
> > > considering?
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Stan
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Vladimir Ozerov
Cache name appears to me purely logical entity. Can we simply store cache
ID in file system paths without adding any restrictions to cache names?

On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <[hidden email]>
wrote:

> Well, that’s my question too :)
> Do we have any compatibility guidelines or other documents on what can or
> cannot be in a minor/major release?
>
> Also, it might be helpful to add an environment variable (like
> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just
> in case.
>
> Thanks,
> Stan
>
> From: Dmitriy Setrakyan
> Sent: 26 декабря 2017 г. 17:02
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> Looks good to me. Is this going to be an exception on startup? If yes, is
> it safe to release it, or should we wait till 3.0?
>
> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Thanks for the feedback.
> >
> > It seems that another thing to handle is case-insensitive FS – “mycache”
> > and “MyCache” is the same on Windows, so it might be reasonable to
> disallow
> > having two caches with names that are equal ignoring case.
> > And one more thing is control characters – forbidding at least range of
> > ASCII 0x00-0x20 seems reasonable.
> >
> > To summarize, a possible set of restrictions would be
> > - Whitespace characters (via Character.isWhitespaceCharacter)
> > - Control characters (via Character.isISOCharacter)
> > - Slashes
> > - Characters reserved in Windows (<>:"/\|?*)
> > - Length (say, up to 255)
> > - Distinct names of caches when ignoring case
> > It seems reasonable to enforce that even regardless of persistence
> > directories naming (AFAIU that’s what Dmitry meant by forbidding things
> > altogether), so that’s what I’m going to do.
> > Any concerns?
> > Specifically, would it be OK from backward compatibility point of view to
> > forbid all these characters now for all caches?
> >
> > Thanks,
> > Stan
> >
> >
> > From: Alexey Kuznetsov
> > Sent: 26 декабря 2017 г. 7:51
> > To: [hidden email]
> > Subject: Re: Handling slashes in cache names
> >
> > It also make sense to limit cache name length to reasonable length.
> > Because some File systems could have limitations on path length.
> > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> >
> > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > My preference would be to prohibit forward and backward slashes in
> cache
> > > names altogether, as they may create a false feeling of some directory
> > > structure, which does not exist. We should also prohibit spaces as
> well.
> > >
> > > D.
> > >
> > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> > and
> > > I
> > > > need some guidance on what’s the best way to approach it.
> > > >
> > > > The problem is that cache names are not restricted, but if
> persistence
> > is
> > > > enabled the cache needs to have a corresponding directory on the file
> > > > system (“cache-…”) which can’t be created if the cache name contains
> > > > certain characters (or a reserved system name).
> > > >
> > > > A straightforward approach would be to check if a cache name is
> allowed
> > > on
> > > > the local system (e.g. via `Paths.get(name)`) and fail to create
> cache
> > if
> > > > it isn’t, but I’m a bit concerned with the consistency of the
> behavior
> > > (the
> > > > same cache name be allowed on one system and not on another).
> > > > I think a better way would be to replace special characters (say, all
> > > > non-alphanumeric characters) with underscores in file names (not
> > changing
> > > > the cache configuration). Would this be OK? Are there any risks I’m
> not
> > > > considering?
> > > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Handling slashes in cache names

Stanislav Lukyanov
In reply to this post by Igor Sapego-2
That’s interesting, thanks.
So, do you think the locale-specific file separators should be banned as well?
Handling all kinds of cases like this might be complicated.

I’d rather use something else if the cache name is not a valid file name, a hash of the cache name.
This way all corner cases can be handled at once.
The algorithm would be
1) Check that cache name doesn’t contain banned characters
2) Try to create a Path for “cache-<cache name>”
3) If failed, create a Path for “cache-<cache name hash>”

Stan

From: Igor Sapego
Sent: 26 декабря 2017 г. 17:59
To: [hidden email]
Subject: Re: Handling slashes in cache names

There are also some international features that you might want to
address. For example, instead of backslash some other characters
may be used on Windows - ¥ on the Japanese version, ₩ on the
Korean version.
See [1] for more info.

Here is the citation:
Security Considerations for Character Sets in File Names
Windows code page and OEM character sets used on
Japanese-language systems contain the Yen symbol (¥) instead of
a backslash (\). Thus, the Yen character is a prohibited character for
NTFS and FAT file systems. When mapping Unicode to
a Japanese-language code page, conversion functions map both
backslash (U+005C) and the normal Unicode Yen symbol (U+00A5)
to this same character. For security reasons, your applications should
not typically allow the character U+00A5 in a Unicode string that
might be converted for use as a FAT file name.

[1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx


Best Regards,
Igor

On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Looks good to me. Is this going to be an exception on startup? If yes, is
> it safe to release it, or should we wait till 3.0?
>
> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Thanks for the feedback.
> >
> > It seems that another thing to handle is case-insensitive FS – “mycache”
> > and “MyCache” is the same on Windows, so it might be reasonable to
> disallow
> > having two caches with names that are equal ignoring case.
> > And one more thing is control characters – forbidding at least range of
> > ASCII 0x00-0x20 seems reasonable.
> >
> > To summarize, a possible set of restrictions would be
> > - Whitespace characters (via Character.isWhitespaceCharacter)
> > - Control characters (via Character.isISOCharacter)
> > - Slashes
> > - Characters reserved in Windows (<>:"/\|?*)
> > - Length (say, up to 255)
> > - Distinct names of caches when ignoring case
> > It seems reasonable to enforce that even regardless of persistence
> > directories naming (AFAIU that’s what Dmitry meant by forbidding things
> > altogether), so that’s what I’m going to do.
> > Any concerns?
> > Specifically, would it be OK from backward compatibility point of view to
> > forbid all these characters now for all caches?
> >
> > Thanks,
> > Stan
> >
> >
> > From: Alexey Kuznetsov
> > Sent: 26 декабря 2017 г. 7:51
> > To: [hidden email]
> > Subject: Re: Handling slashes in cache names
> >
> > It also make sense to limit cache name length to reasonable length.
> > Because some File systems could have limitations on path length.
> > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> >
> > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > My preference would be to prohibit forward and backward slashes in
> cache
> > > names altogether, as they may create a false feeling of some directory
> > > structure, which does not exist. We should also prohibit spaces as
> well.
> > >
> > > D.
> > >
> > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> > and
> > > I
> > > > need some guidance on what’s the best way to approach it.
> > > >
> > > > The problem is that cache names are not restricted, but if
> persistence
> > is
> > > > enabled the cache needs to have a corresponding directory on the file
> > > > system (“cache-…”) which can’t be created if the cache name contains
> > > > certain characters (or a reserved system name).
> > > >
> > > > A straightforward approach would be to check if a cache name is
> allowed
> > > on
> > > > the local system (e.g. via `Paths.get(name)`) and fail to create
> cache
> > if
> > > > it isn’t, but I’m a bit concerned with the consistency of the
> behavior
> > > (the
> > > > same cache name be allowed on one system and not on another).
> > > > I think a better way would be to replace special characters (say, all
> > > > non-alphanumeric characters) with underscores in file names (not
> > changing
> > > > the cache configuration). Would this be OK? Are there any risks I’m
> not
> > > > considering?
> > > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Pavel Tupitsyn
Agree with Stan and Vladimir.
We should not impose any restrictions on cache names, some users may have
issues with that.

Using cache names as file names is internal implementation detail.
We can use cache id or some kind of encoding (base64, etc) to avoid file
system issues.

Thanks,
Pavel

On Wed, Dec 27, 2017 at 2:38 PM, Stanislav Lukyanov <[hidden email]>
wrote:

> That’s interesting, thanks.
> So, do you think the locale-specific file separators should be banned as
> well?
> Handling all kinds of cases like this might be complicated.
>
> I’d rather use something else if the cache name is not a valid file name,
> a hash of the cache name.
> This way all corner cases can be handled at once.
> The algorithm would be
> 1) Check that cache name doesn’t contain banned characters
> 2) Try to create a Path for “cache-<cache name>”
> 3) If failed, create a Path for “cache-<cache name hash>”
>
> Stan
>
> From: Igor Sapego
> Sent: 26 декабря 2017 г. 17:59
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> There are also some international features that you might want to
> address. For example, instead of backslash some other characters
> may be used on Windows - ¥ on the Japanese version, ₩ on the
> Korean version.
> See [1] for more info.
>
> Here is the citation:
> Security Considerations for Character Sets in File Names
> Windows code page and OEM character sets used on
> Japanese-language systems contain the Yen symbol (¥) instead of
> a backslash (\). Thus, the Yen character is a prohibited character for
> NTFS and FAT file systems. When mapping Unicode to
> a Japanese-language code page, conversion functions map both
> backslash (U+005C) and the normal Unicode Yen symbol (U+00A5)
> to this same character. For security reasons, your applications should
> not typically allow the character U+00A5 in a Unicode string that
> might be converted for use as a FAT file name.
>
> [1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx
>
>
> Best Regards,
> Igor
>
> On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Looks good to me. Is this going to be an exception on startup? If yes, is
> > it safe to release it, or should we wait till 3.0?
> >
> > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Thanks for the feedback.
> > >
> > > It seems that another thing to handle is case-insensitive FS –
> “mycache”
> > > and “MyCache” is the same on Windows, so it might be reasonable to
> > disallow
> > > having two caches with names that are equal ignoring case.
> > > And one more thing is control characters – forbidding at least range of
> > > ASCII 0x00-0x20 seems reasonable.
> > >
> > > To summarize, a possible set of restrictions would be
> > > - Whitespace characters (via Character.isWhitespaceCharacter)
> > > - Control characters (via Character.isISOCharacter)
> > > - Slashes
> > > - Characters reserved in Windows (<>:"/\|?*)
> > > - Length (say, up to 255)
> > > - Distinct names of caches when ignoring case
> > > It seems reasonable to enforce that even regardless of persistence
> > > directories naming (AFAIU that’s what Dmitry meant by forbidding things
> > > altogether), so that’s what I’m going to do.
> > > Any concerns?
> > > Specifically, would it be OK from backward compatibility point of view
> to
> > > forbid all these characters now for all caches?
> > >
> > > Thanks,
> > > Stan
> > >
> > >
> > > From: Alexey Kuznetsov
> > > Sent: 26 декабря 2017 г. 7:51
> > > To: [hidden email]
> > > Subject: Re: Handling slashes in cache names
> > >
> > > It also make sense to limit cache name length to reasonable length.
> > > Because some File systems could have limitations on path length.
> > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> > >
> > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> > [hidden email]>
> > > wrote:
> > >
> > > > My preference would be to prohibit forward and backward slashes in
> > cache
> > > > names altogether, as they may create a false feeling of some
> directory
> > > > structure, which does not exist. We should also prohibit spaces as
> > well.
> > > >
> > > > D.
> > > >
> > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264
> ,
> > > and
> > > > I
> > > > > need some guidance on what’s the best way to approach it.
> > > > >
> > > > > The problem is that cache names are not restricted, but if
> > persistence
> > > is
> > > > > enabled the cache needs to have a corresponding directory on the
> file
> > > > > system (“cache-…”) which can’t be created if the cache name
> contains
> > > > > certain characters (or a reserved system name).
> > > > >
> > > > > A straightforward approach would be to check if a cache name is
> > allowed
> > > > on
> > > > > the local system (e.g. via `Paths.get(name)`) and fail to create
> > cache
> > > if
> > > > > it isn’t, but I’m a bit concerned with the consistency of the
> > behavior
> > > > (the
> > > > > same cache name be allowed on one system and not on another).
> > > > > I think a better way would be to replace special characters (say,
> all
> > > > > non-alphanumeric characters) with underscores in file names (not
> > > changing
> > > > > the cache configuration). Would this be OK? Are there any risks I’m
> > not
> > > > > considering?
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Handling slashes in cache names

Stanislav Lukyanov
In reply to this post by Vladimir Ozerov
We can – by mapping a cache name to some (safe) string to be used as a directory name, say via Base64 as Pavel has suggested.

However, I think that banning certain characters might be reasonable.
Some characters might be considered reserved (e.g. slashes, colon, asterisk, etc) to be used later, in case some future feature requires cache names to have an actual meaning.
Some characters might be banned just as a precaution (e.g. control characters or whitespaces) because they might cause problems with logging or elsewhere (you might have a bad time processing a cache name with \0 in it :) ).

The question is whether or not these considerations worth adding code and/or changing existing behavior.

BTW Java folks had similar discussion on Java module names resulting in http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-December/000515.html.

Thanks,
Stan

From: Vladimir Ozerov
Sent: 27 декабря 2017 г. 14:37
To: [hidden email]
Subject: Re: Handling slashes in cache names

Cache name appears to me purely logical entity. Can we simply store cache
ID in file system paths without adding any restrictions to cache names?

On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <[hidden email]>
wrote:

> Well, that’s my question too :)
> Do we have any compatibility guidelines or other documents on what can or
> cannot be in a minor/major release?
>
> Also, it might be helpful to add an environment variable (like
> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just
> in case.
>
> Thanks,
> Stan
>
> From: Dmitriy Setrakyan
> Sent: 26 декабря 2017 г. 17:02
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> Looks good to me. Is this going to be an exception on startup? If yes, is
> it safe to release it, or should we wait till 3.0?
>
> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Thanks for the feedback.
> >
> > It seems that another thing to handle is case-insensitive FS – “mycache”
> > and “MyCache” is the same on Windows, so it might be reasonable to
> disallow
> > having two caches with names that are equal ignoring case.
> > And one more thing is control characters – forbidding at least range of
> > ASCII 0x00-0x20 seems reasonable.
> >
> > To summarize, a possible set of restrictions would be
> > - Whitespace characters (via Character.isWhitespaceCharacter)
> > - Control characters (via Character.isISOCharacter)
> > - Slashes
> > - Characters reserved in Windows (<>:"/\|?*)
> > - Length (say, up to 255)
> > - Distinct names of caches when ignoring case
> > It seems reasonable to enforce that even regardless of persistence
> > directories naming (AFAIU that’s what Dmitry meant by forbidding things
> > altogether), so that’s what I’m going to do.
> > Any concerns?
> > Specifically, would it be OK from backward compatibility point of view to
> > forbid all these characters now for all caches?
> >
> > Thanks,
> > Stan
> >
> >
> > From: Alexey Kuznetsov
> > Sent: 26 декабря 2017 г. 7:51
> > To: [hidden email]
> > Subject: Re: Handling slashes in cache names
> >
> > It also make sense to limit cache name length to reasonable length.
> > Because some File systems could have limitations on path length.
> > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> >
> > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > My preference would be to prohibit forward and backward slashes in
> cache
> > > names altogether, as they may create a false feeling of some directory
> > > structure, which does not exist. We should also prohibit spaces as
> well.
> > >
> > > D.
> > >
> > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> > and
> > > I
> > > > need some guidance on what’s the best way to approach it.
> > > >
> > > > The problem is that cache names are not restricted, but if
> persistence
> > is
> > > > enabled the cache needs to have a corresponding directory on the file
> > > > system (“cache-…”) which can’t be created if the cache name contains
> > > > certain characters (or a reserved system name).
> > > >
> > > > A straightforward approach would be to check if a cache name is
> allowed
> > > on
> > > > the local system (e.g. via `Paths.get(name)`) and fail to create
> cache
> > if
> > > > it isn’t, but I’m a bit concerned with the consistency of the
> behavior
> > > (the
> > > > same cache name be allowed on one system and not on another).
> > > > I think a better way would be to replace special characters (say, all
> > > > non-alphanumeric characters) with underscores in file names (not
> > changing
> > > > the cache configuration). Would this be OK? Are there any risks I’m
> not
> > > > considering?
> > > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

vveider
Special characters banning seems to be exclusive way and cannot be controlled in future if new symbols arise.
Maybe better solution will be choosing the array of permitted symbols for caches names (i.e. [a-zA-Z0-9_-])?


Also +1 for using abstract hash string for directories names.


> On 27 Dec 2017, at 15:14, Stanislav Lukyanov <[hidden email]> wrote:
>
> We can – by mapping a cache name to some (safe) string to be used as a directory name, say via Base64 as Pavel has suggested.
>
> However, I think that banning certain characters might be reasonable.
> Some characters might be considered reserved (e.g. slashes, colon, asterisk, etc) to be used later, in case some future feature requires cache names to have an actual meaning.
> Some characters might be banned just as a precaution (e.g. control characters or whitespaces) because they might cause problems with logging or elsewhere (you might have a bad time processing a cache name with \0 in it :) ).
>
> The question is whether or not these considerations worth adding code and/or changing existing behavior.
>
> BTW Java folks had similar discussion on Java module names resulting in http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-December/000515.html.
>
> Thanks,
> Stan
>
> From: Vladimir Ozerov
> Sent: 27 декабря 2017 г. 14:37
> To: [hidden email]
> Subject: Re: Handling slashes in cache names
>
> Cache name appears to me purely logical entity. Can we simply store cache
> ID in file system paths without adding any restrictions to cache names?
>
> On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <[hidden email]>
> wrote:
>
>> Well, that’s my question too :)
>> Do we have any compatibility guidelines or other documents on what can or
>> cannot be in a minor/major release?
>>
>> Also, it might be helpful to add an environment variable (like
>> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just
>> in case.
>>
>> Thanks,
>> Stan
>>
>> From: Dmitriy Setrakyan
>> Sent: 26 декабря 2017 г. 17:02
>> To: [hidden email]
>> Subject: Re: Handling slashes in cache names
>>
>> Looks good to me. Is this going to be an exception on startup? If yes, is
>> it safe to release it, or should we wait till 3.0?
>>
>> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
>> [hidden email]>
>> wrote:
>>
>>> Thanks for the feedback.
>>>
>>> It seems that another thing to handle is case-insensitive FS – “mycache”
>>> and “MyCache” is the same on Windows, so it might be reasonable to
>> disallow
>>> having two caches with names that are equal ignoring case.
>>> And one more thing is control characters – forbidding at least range of
>>> ASCII 0x00-0x20 seems reasonable.
>>>
>>> To summarize, a possible set of restrictions would be
>>> - Whitespace characters (via Character.isWhitespaceCharacter)
>>> - Control characters (via Character.isISOCharacter)
>>> - Slashes
>>> - Characters reserved in Windows (<>:"/\|?*)
>>> - Length (say, up to 255)
>>> - Distinct names of caches when ignoring case
>>> It seems reasonable to enforce that even regardless of persistence
>>> directories naming (AFAIU that’s what Dmitry meant by forbidding things
>>> altogether), so that’s what I’m going to do.
>>> Any concerns?
>>> Specifically, would it be OK from backward compatibility point of view to
>>> forbid all these characters now for all caches?
>>>
>>> Thanks,
>>> Stan
>>>
>>>
>>> From: Alexey Kuznetsov
>>> Sent: 26 декабря 2017 г. 7:51
>>> To: [hidden email]
>>> Subject: Re: Handling slashes in cache names
>>>
>>> It also make sense to limit cache name length to reasonable length.
>>> Because some File systems could have limitations on path length.
>>> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
>>>
>>> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
>> [hidden email]>
>>> wrote:
>>>
>>>> My preference would be to prohibit forward and backward slashes in
>> cache
>>>> names altogether, as they may create a false feeling of some directory
>>>> structure, which does not exist. We should also prohibit spaces as
>> well.
>>>>
>>>> D.
>>>>
>>>> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
>>>> [hidden email]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
>>> and
>>>> I
>>>>> need some guidance on what’s the best way to approach it.
>>>>>
>>>>> The problem is that cache names are not restricted, but if
>> persistence
>>> is
>>>>> enabled the cache needs to have a corresponding directory on the file
>>>>> system (“cache-…”) which can’t be created if the cache name contains
>>>>> certain characters (or a reserved system name).
>>>>>
>>>>> A straightforward approach would be to check if a cache name is
>> allowed
>>>> on
>>>>> the local system (e.g. via `Paths.get(name)`) and fail to create
>> cache
>>> if
>>>>> it isn’t, but I’m a bit concerned with the consistency of the
>> behavior
>>>> (the
>>>>> same cache name be allowed on one system and not on another).
>>>>> I think a better way would be to replace special characters (say, all
>>>>> non-alphanumeric characters) with underscores in file names (not
>>> changing
>>>>> the cache configuration). Would this be OK? Are there any risks I’m
>> not
>>>>> considering?
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Thanks,
>>>>> Stan
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Alexey Kuznetsov
>>>
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Igor Sapego-2
I personally like a Pavel's suggestion - base64 encoding seems like
a good solution, while string hashes will arise a collision issue.

Best Regards,
Igor

On Wed, Dec 27, 2017 at 3:29 PM, Petr Ivanov <[hidden email]> wrote:

> Special characters banning seems to be exclusive way and cannot be
> controlled in future if new symbols arise.
> Maybe better solution will be choosing the array of permitted symbols for
> caches names (i.e. [a-zA-Z0-9_-])?
>
>
> Also +1 for using abstract hash string for directories names.
>
>
> > On 27 Dec 2017, at 15:14, Stanislav Lukyanov <[hidden email]>
> wrote:
> >
> > We can – by mapping a cache name to some (safe) string to be used as a
> directory name, say via Base64 as Pavel has suggested.
> >
> > However, I think that banning certain characters might be reasonable.
> > Some characters might be considered reserved (e.g. slashes, colon,
> asterisk, etc) to be used later, in case some future feature requires cache
> names to have an actual meaning.
> > Some characters might be banned just as a precaution (e.g. control
> characters or whitespaces) because they might cause problems with logging
> or elsewhere (you might have a bad time processing a cache name with \0 in
> it :) ).
> >
> > The question is whether or not these considerations worth adding code
> and/or changing existing behavior.
> >
> > BTW Java folks had similar discussion on Java module names resulting in
> http://mail.openjdk.java.net/pipermail/jpms-spec-experts/
> 2016-December/000515.html.
> >
> > Thanks,
> > Stan
> >
> > From: Vladimir Ozerov
> > Sent: 27 декабря 2017 г. 14:37
> > To: [hidden email]
> > Subject: Re: Handling slashes in cache names
> >
> > Cache name appears to me purely logical entity. Can we simply store cache
> > ID in file system paths without adding any restrictions to cache names?
> >
> > On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <
> [hidden email]>
> > wrote:
> >
> >> Well, that’s my question too :)
> >> Do we have any compatibility guidelines or other documents on what can
> or
> >> cannot be in a minor/major release?
> >>
> >> Also, it might be helpful to add an environment variable (like
> >> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior,
> just
> >> in case.
> >>
> >> Thanks,
> >> Stan
> >>
> >> From: Dmitriy Setrakyan
> >> Sent: 26 декабря 2017 г. 17:02
> >> To: [hidden email]
> >> Subject: Re: Handling slashes in cache names
> >>
> >> Looks good to me. Is this going to be an exception on startup? If yes,
> is
> >> it safe to release it, or should we wait till 3.0?
> >>
> >> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
> >> [hidden email]>
> >> wrote:
> >>
> >>> Thanks for the feedback.
> >>>
> >>> It seems that another thing to handle is case-insensitive FS –
> “mycache”
> >>> and “MyCache” is the same on Windows, so it might be reasonable to
> >> disallow
> >>> having two caches with names that are equal ignoring case.
> >>> And one more thing is control characters – forbidding at least range of
> >>> ASCII 0x00-0x20 seems reasonable.
> >>>
> >>> To summarize, a possible set of restrictions would be
> >>> - Whitespace characters (via Character.isWhitespaceCharacter)
> >>> - Control characters (via Character.isISOCharacter)
> >>> - Slashes
> >>> - Characters reserved in Windows (<>:"/\|?*)
> >>> - Length (say, up to 255)
> >>> - Distinct names of caches when ignoring case
> >>> It seems reasonable to enforce that even regardless of persistence
> >>> directories naming (AFAIU that’s what Dmitry meant by forbidding things
> >>> altogether), so that’s what I’m going to do.
> >>> Any concerns?
> >>> Specifically, would it be OK from backward compatibility point of view
> to
> >>> forbid all these characters now for all caches?
> >>>
> >>> Thanks,
> >>> Stan
> >>>
> >>>
> >>> From: Alexey Kuznetsov
> >>> Sent: 26 декабря 2017 г. 7:51
> >>> To: [hidden email]
> >>> Subject: Re: Handling slashes in cache names
> >>>
> >>> It also make sense to limit cache name length to reasonable length.
> >>> Because some File systems could have limitations on path length.
> >>> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
> >>>
> >>> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
> >> [hidden email]>
> >>> wrote:
> >>>
> >>>> My preference would be to prohibit forward and backward slashes in
> >> cache
> >>>> names altogether, as they may create a false feeling of some directory
> >>>> structure, which does not exist. We should also prohibit spaces as
> >> well.
> >>>>
> >>>> D.
> >>>>
> >>>> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
> >>>> [hidden email]>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
> >>> and
> >>>> I
> >>>>> need some guidance on what’s the best way to approach it.
> >>>>>
> >>>>> The problem is that cache names are not restricted, but if
> >> persistence
> >>> is
> >>>>> enabled the cache needs to have a corresponding directory on the file
> >>>>> system (“cache-…”) which can’t be created if the cache name contains
> >>>>> certain characters (or a reserved system name).
> >>>>>
> >>>>> A straightforward approach would be to check if a cache name is
> >> allowed
> >>>> on
> >>>>> the local system (e.g. via `Paths.get(name)`) and fail to create
> >> cache
> >>> if
> >>>>> it isn’t, but I’m a bit concerned with the consistency of the
> >> behavior
> >>>> (the
> >>>>> same cache name be allowed on one system and not on another).
> >>>>> I think a better way would be to replace special characters (say, all
> >>>>> non-alphanumeric characters) with underscores in file names (not
> >>> changing
> >>>>> the cache configuration). Would this be OK? Are there any risks I’m
> >> not
> >>>>> considering?
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> Thanks,
> >>>>> Stan
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Alexey Kuznetsov
> >>>
> >>>
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Igor Sapego-2
Also, considering case-insensitivity issue, we need to choose
some encoding that only uses upper or lower case letters in
encoding result.

By the way, such encoding will resolve cache name clashes
due to case-insensitivity issue.

Best Regards,
Igor

On Wed, Dec 27, 2017 at 4:18 PM, Igor Sapego <[hidden email]> wrote:

> I personally like a Pavel's suggestion - base64 encoding seems like
> a good solution, while string hashes will arise a collision issue.
>
> Best Regards,
> Igor
>
> On Wed, Dec 27, 2017 at 3:29 PM, Petr Ivanov <[hidden email]> wrote:
>
>> Special characters banning seems to be exclusive way and cannot be
>> controlled in future if new symbols arise.
>> Maybe better solution will be choosing the array of permitted symbols for
>> caches names (i.e. [a-zA-Z0-9_-])?
>>
>>
>> Also +1 for using abstract hash string for directories names.
>>
>>
>> > On 27 Dec 2017, at 15:14, Stanislav Lukyanov <[hidden email]>
>> wrote:
>> >
>> > We can – by mapping a cache name to some (safe) string to be used as a
>> directory name, say via Base64 as Pavel has suggested.
>> >
>> > However, I think that banning certain characters might be reasonable.
>> > Some characters might be considered reserved (e.g. slashes, colon,
>> asterisk, etc) to be used later, in case some future feature requires cache
>> names to have an actual meaning.
>> > Some characters might be banned just as a precaution (e.g. control
>> characters or whitespaces) because they might cause problems with logging
>> or elsewhere (you might have a bad time processing a cache name with \0 in
>> it :) ).
>> >
>> > The question is whether or not these considerations worth adding code
>> and/or changing existing behavior.
>> >
>> > BTW Java folks had similar discussion on Java module names resulting in
>> http://mail.openjdk.java.net/pipermail/jpms-spec-experts/201
>> 6-December/000515.html.
>> >
>> > Thanks,
>> > Stan
>> >
>> > From: Vladimir Ozerov
>> > Sent: 27 декабря 2017 г. 14:37
>> > To: [hidden email]
>> > Subject: Re: Handling slashes in cache names
>> >
>> > Cache name appears to me purely logical entity. Can we simply store
>> cache
>> > ID in file system paths without adding any restrictions to cache names?
>> >
>> > On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <
>> [hidden email]>
>> > wrote:
>> >
>> >> Well, that’s my question too :)
>> >> Do we have any compatibility guidelines or other documents on what can
>> or
>> >> cannot be in a minor/major release?
>> >>
>> >> Also, it might be helpful to add an environment variable (like
>> >> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior,
>> just
>> >> in case.
>> >>
>> >> Thanks,
>> >> Stan
>> >>
>> >> From: Dmitriy Setrakyan
>> >> Sent: 26 декабря 2017 г. 17:02
>> >> To: [hidden email]
>> >> Subject: Re: Handling slashes in cache names
>> >>
>> >> Looks good to me. Is this going to be an exception on startup? If yes,
>> is
>> >> it safe to release it, or should we wait till 3.0?
>> >>
>> >> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <
>> >> [hidden email]>
>> >> wrote:
>> >>
>> >>> Thanks for the feedback.
>> >>>
>> >>> It seems that another thing to handle is case-insensitive FS –
>> “mycache”
>> >>> and “MyCache” is the same on Windows, so it might be reasonable to
>> >> disallow
>> >>> having two caches with names that are equal ignoring case.
>> >>> And one more thing is control characters – forbidding at least range
>> of
>> >>> ASCII 0x00-0x20 seems reasonable.
>> >>>
>> >>> To summarize, a possible set of restrictions would be
>> >>> - Whitespace characters (via Character.isWhitespaceCharacter)
>> >>> - Control characters (via Character.isISOCharacter)
>> >>> - Slashes
>> >>> - Characters reserved in Windows (<>:"/\|?*)
>> >>> - Length (say, up to 255)
>> >>> - Distinct names of caches when ignoring case
>> >>> It seems reasonable to enforce that even regardless of persistence
>> >>> directories naming (AFAIU that’s what Dmitry meant by forbidding
>> things
>> >>> altogether), so that’s what I’m going to do.
>> >>> Any concerns?
>> >>> Specifically, would it be OK from backward compatibility point of
>> view to
>> >>> forbid all these characters now for all caches?
>> >>>
>> >>> Thanks,
>> >>> Stan
>> >>>
>> >>>
>> >>> From: Alexey Kuznetsov
>> >>> Sent: 26 декабря 2017 г. 7:51
>> >>> To: [hidden email]
>> >>> Subject: Re: Handling slashes in cache names
>> >>>
>> >>> It also make sense to limit cache name length to reasonable length.
>> >>> Because some File systems could have limitations on path length.
>> >>> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions
>> >>>
>> >>> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <
>> >> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>> My preference would be to prohibit forward and backward slashes in
>> >> cache
>> >>>> names altogether, as they may create a false feeling of some
>> directory
>> >>>> structure, which does not exist. We should also prohibit spaces as
>> >> well.
>> >>>>
>> >>>> D.
>> >>>>
>> >>>> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov <
>> >>>> [hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264,
>> >>> and
>> >>>> I
>> >>>>> need some guidance on what’s the best way to approach it.
>> >>>>>
>> >>>>> The problem is that cache names are not restricted, but if
>> >> persistence
>> >>> is
>> >>>>> enabled the cache needs to have a corresponding directory on the
>> file
>> >>>>> system (“cache-…”) which can’t be created if the cache name contains
>> >>>>> certain characters (or a reserved system name).
>> >>>>>
>> >>>>> A straightforward approach would be to check if a cache name is
>> >> allowed
>> >>>> on
>> >>>>> the local system (e.g. via `Paths.get(name)`) and fail to create
>> >> cache
>> >>> if
>> >>>>> it isn’t, but I’m a bit concerned with the consistency of the
>> >> behavior
>> >>>> (the
>> >>>>> same cache name be allowed on one system and not on another).
>> >>>>> I think a better way would be to replace special characters (say,
>> all
>> >>>>> non-alphanumeric characters) with underscores in file names (not
>> >>> changing
>> >>>>> the cache configuration). Would this be OK? Are there any risks I’m
>> >> not
>> >>>>> considering?
>> >>>>>
>> >>>>> WDYT?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Stan
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Alexey Kuznetsov
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

dsetrakyan
In reply to this post by Pavel Tupitsyn
On Wed, Dec 27, 2017 at 3:42 AM, Pavel Tupitsyn <[hidden email]>
wrote:

> Agree with Stan and Vladimir.
> We should not impose any restrictions on cache names, some users may have
> issues with that.
>
> Using cache names as file names is internal implementation detail.
> We can use cache id or some kind of encoding (base64, etc) to avoid file
> system issues.
>
>
Pavel, I disagree. I want to look at the file system and be able to clearly
tell which folder belongs to which cache. If you use encryption or some
other encoding, this would be impossible.

I doubt that introducing cache name validation for *persistent* caches
would affect any existing users. It sounds like for non-persistent caches
the validation is not needed, right?

D.
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Vladimir Ozerov
Having different policies for persistent and non-persistent caches sounds
like a bad idea for me, because there could be troubles should user try to
switch to persistent mode. It would require code changes.

Can we just escape all non-latin symbols (e.g. using base64), while leaving
the rest as is? With this approach in most cases cache name will remain the
same, and only multibyte characters would be affected.

On Wed, Dec 27, 2017 at 5:15 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Wed, Dec 27, 2017 at 3:42 AM, Pavel Tupitsyn <[hidden email]>
> wrote:
>
> > Agree with Stan and Vladimir.
> > We should not impose any restrictions on cache names, some users may have
> > issues with that.
> >
> > Using cache names as file names is internal implementation detail.
> > We can use cache id or some kind of encoding (base64, etc) to avoid file
> > system issues.
> >
> >
> Pavel, I disagree. I want to look at the file system and be able to clearly
> tell which folder belongs to which cache. If you use encryption or some
> other encoding, this would be impossible.
>
> I doubt that introducing cache name validation for *persistent* caches
> would affect any existing users. It sounds like for non-persistent caches
> the validation is not needed, right?
>
> D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

dsetrakyan
On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerov <[hidden email]>
wrote:

> Having different policies for persistent and non-persistent caches sounds
> like a bad idea for me, because there could be troubles should user try to
> switch to persistent mode. It would require code changes.
>
> Can we just escape all non-latin symbols (e.g. using base64), while leaving
> the rest as is? With this approach in most cases cache name will remain the
> same, and only multibyte characters would be affected.
>

Agree, if we can keep cache names in human readable form. Would be nice to
see some examples.
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Pavel Tupitsyn
Yep, base64 is just an example.
We need some kind of urlencode, but tailored for file names, so that
names remain readable.

To avoid uppercase/lowercase collisions on Windows, we can restrict allowed
characters
to lowercase English letters and numbers, - and _, and escape everything
else in some way.

On Wed, Dec 27, 2017 at 5:36 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Having different policies for persistent and non-persistent caches sounds
> > like a bad idea for me, because there could be troubles should user try
> to
> > switch to persistent mode. It would require code changes.
> >
> > Can we just escape all non-latin symbols (e.g. using base64), while
> leaving
> > the rest as is? With this approach in most cases cache name will remain
> the
> > same, and only multibyte characters would be affected.
> >
>
> Agree, if we can keep cache names in human readable form. Would be nice to
> see some examples.
>
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

Sergey Kozlov
Igniters

Use cache name for file and directory names on a file system is bad idea.
In that case we should keep in mind many limitiations vary FS.
Why do not use mapping cache name to an identifier tolerated to FS lacks?

On Wed, Dec 27, 2017 at 7:05 PM, Pavel Tupitsyn <[hidden email]>
wrote:

> Yep, base64 is just an example.
> We need some kind of urlencode, but tailored for file names, so that
> names remain readable.
>
> To avoid uppercase/lowercase collisions on Windows, we can restrict allowed
> characters
> to lowercase English letters and numbers, - and _, and escape everything
> else in some way.
>
> On Wed, Dec 27, 2017 at 5:36 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerov <[hidden email]>
> > wrote:
> >
> > > Having different policies for persistent and non-persistent caches
> sounds
> > > like a bad idea for me, because there could be troubles should user try
> > to
> > > switch to persistent mode. It would require code changes.
> > >
> > > Can we just escape all non-latin symbols (e.g. using base64), while
> > leaving
> > > the rest as is? With this approach in most cases cache name will remain
> > the
> > > same, and only multibyte characters would be affected.
> > >
> >
> > Agree, if we can keep cache names in human readable form. Would be nice
> to
> > see some examples.
> >
>



--
Sergey Kozlov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: Handling slashes in cache names

dsetrakyan
In reply to this post by Pavel Tupitsyn
On Wed, Dec 27, 2017 at 8:05 AM, Pavel Tupitsyn <[hidden email]>
wrote:

> Yep, base64 is just an example.
> We need some kind of urlencode, but tailored for file names, so that
> names remain readable.
>
> To avoid uppercase/lowercase collisions on Windows, we can restrict allowed
> characters to lowercase English letters and numbers, - and _, and escape
> everything
> else in some way.
>

I think that we should allow users to specify any case they like, but
internally we should always convert to upper or lower case, whichever one
we choose.
12