By bytes access to binary format

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

By bytes access to binary format

Vladislav Pyatkov
Hi,

Recently, from one of Ignite user, I listened interest idea.
What if I want to pass some date to java stream from cache.

With binary I do it like this:

BinaryObject get = (BinaryObject) cache.get(key);
byte[] dataFromCache = get.<byte[]>field("data");
System.out.write(dataFromCache, 0, dataFromCache.length);

But in this case we got garbage a lot, due to each time new bytes array is
creating.

This will lead to many GC events in case we load a some of million entries.
Could we offer additional API for working with java stream:

BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));

or with buffer

BinaryObject.writeBytesToBuf("data", new byte[1000], 100);

I already created a Jira ticket.
https://issues.apache.org/jira/browse/IGNITE-5602

--
Vladislav Pyatkov
Architect-Consultant "GridGain Rus" Llc.
+7 963 716 68 99
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

Valentin Kulichenko
Vladislav,

Are you suggesting to stream directly from cache. or from a binary object
that is already copied from cache?

-Val

On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <[hidden email]>
wrote:

> Hi,
>
> Recently, from one of Ignite user, I listened interest idea.
> What if I want to pass some date to java stream from cache.
>
> With binary I do it like this:
>
> BinaryObject get = (BinaryObject) cache.get(key);
> byte[] dataFromCache = get.<byte[]>field("data");
> System.out.write(dataFromCache, 0, dataFromCache.length);
>
> But in this case we got garbage a lot, due to each time new bytes array is
> creating.
>
> This will lead to many GC events in case we load a some of million entries.
> Could we offer additional API for working with java stream:
>
> BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
>
> or with buffer
>
> BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
>
> I already created a Jira ticket.
> https://issues.apache.org/jira/browse/IGNITE-5602
>
> --
> Vladislav Pyatkov
> Architect-Consultant "GridGain Rus" Llc.
> +7 963 716 68 99
>
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

Vladislav Pyatkov
Val,

I proposal, access as a stream to binary object, because we have doubled
copy on touch a field (first at copy from cache and second - on getting a
field).

For the stream in/out to cache I will be used IGFS.
Main idea to avoid GC pressure when make a massive read from key-value
storage.

On Wed, Jun 28, 2017 at 9:36 PM, Valentin Kulichenko <
[hidden email]> wrote:

> Vladislav,
>
> Are you suggesting to stream directly from cache. or from a binary object
> that is already copied from cache?
>
> -Val
>
> On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Recently, from one of Ignite user, I listened interest idea.
> > What if I want to pass some date to java stream from cache.
> >
> > With binary I do it like this:
> >
> > BinaryObject get = (BinaryObject) cache.get(key);
> > byte[] dataFromCache = get.<byte[]>field("data");
> > System.out.write(dataFromCache, 0, dataFromCache.length);
> >
> > But in this case we got garbage a lot, due to each time new bytes array
> is
> > creating.
> >
> > This will lead to many GC events in case we load a some of million
> entries.
> > Could we offer additional API for working with java stream:
> >
> > BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
> >
> > or with buffer
> >
> > BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
> >
> > I already created a Jira ticket.
> > https://issues.apache.org/jira/browse/IGNITE-5602
> >
> > --
> > Vladislav Pyatkov
> > Architect-Consultant "GridGain Rus" Llc.
> > +7 963 716 68 99
> >
>



--
Vladislav Pyatkov
Architect-Consultant "GridGain Rus" Llc.
+7 963 716 68 99
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

Vladimir Ozerov
Hi Vlad,

I am not quite sure I understand the problem. Can you show how the API you
propose would look like? Remember that "field" method can return anything
from primitive, String or byte array, to another BinaryObject. And returned
BinaryObject can have references outside of itself, so it cannot be
serialized easily without full rebuild. .

On Thu, Jun 29, 2017 at 10:16 AM, Vladislav Pyatkov <[hidden email]>
wrote:

> Val,
>
> I proposal, access as a stream to binary object, because we have doubled
> copy on touch a field (first at copy from cache and second - on getting a
> field).
>
> For the stream in/out to cache I will be used IGFS.
> Main idea to avoid GC pressure when make a massive read from key-value
> storage.
>
> On Wed, Jun 28, 2017 at 9:36 PM, Valentin Kulichenko <
> [hidden email]> wrote:
>
> > Vladislav,
> >
> > Are you suggesting to stream directly from cache. or from a binary object
> > that is already copied from cache?
> >
> > -Val
> >
> > On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <
> [hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > Recently, from one of Ignite user, I listened interest idea.
> > > What if I want to pass some date to java stream from cache.
> > >
> > > With binary I do it like this:
> > >
> > > BinaryObject get = (BinaryObject) cache.get(key);
> > > byte[] dataFromCache = get.<byte[]>field("data");
> > > System.out.write(dataFromCache, 0, dataFromCache.length);
> > >
> > > But in this case we got garbage a lot, due to each time new bytes array
> > is
> > > creating.
> > >
> > > This will lead to many GC events in case we load a some of million
> > entries.
> > > Could we offer additional API for working with java stream:
> > >
> > > BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
> > >
> > > or with buffer
> > >
> > > BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
> > >
> > > I already created a Jira ticket.
> > > https://issues.apache.org/jira/browse/IGNITE-5602
> > >
> > > --
> > > Vladislav Pyatkov
> > > Architect-Consultant "GridGain Rus" Llc.
> > > +7 963 716 68 99
> > >
> >
>
>
>
> --
> Vladislav Pyatkov
> Architect-Consultant "GridGain Rus" Llc.
> +7 963 716 68 99
>
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

Valentin Kulichenko
Vova,

Generally this can be useful. If you have a read-only binary object with a
large blob as a field, you don't want to copy this array when reading it.
Instead, we can return a ByteBuffer or a stream wrapping the corresponding
portion.

However, I currently don't see how this can be smoothly added to existing
API. Vlad, do you have any concrete proposal on how it should look like?

-Val

On Thu, Jun 29, 2017 at 2:11 PM, Vladimir Ozerov <[hidden email]>
wrote:

> Hi Vlad,
>
> I am not quite sure I understand the problem. Can you show how the API you
> propose would look like? Remember that "field" method can return anything
> from primitive, String or byte array, to another BinaryObject. And returned
> BinaryObject can have references outside of itself, so it cannot be
> serialized easily without full rebuild. .
>
> On Thu, Jun 29, 2017 at 10:16 AM, Vladislav Pyatkov <[hidden email]
> >
> wrote:
>
> > Val,
> >
> > I proposal, access as a stream to binary object, because we have doubled
> > copy on touch a field (first at copy from cache and second - on getting a
> > field).
> >
> > For the stream in/out to cache I will be used IGFS.
> > Main idea to avoid GC pressure when make a massive read from key-value
> > storage.
> >
> > On Wed, Jun 28, 2017 at 9:36 PM, Valentin Kulichenko <
> > [hidden email]> wrote:
> >
> > > Vladislav,
> > >
> > > Are you suggesting to stream directly from cache. or from a binary
> object
> > > that is already copied from cache?
> > >
> > > -Val
> > >
> > > On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <
> > [hidden email]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Recently, from one of Ignite user, I listened interest idea.
> > > > What if I want to pass some date to java stream from cache.
> > > >
> > > > With binary I do it like this:
> > > >
> > > > BinaryObject get = (BinaryObject) cache.get(key);
> > > > byte[] dataFromCache = get.<byte[]>field("data");
> > > > System.out.write(dataFromCache, 0, dataFromCache.length);
> > > >
> > > > But in this case we got garbage a lot, due to each time new bytes
> array
> > > is
> > > > creating.
> > > >
> > > > This will lead to many GC events in case we load a some of million
> > > entries.
> > > > Could we offer additional API for working with java stream:
> > > >
> > > > BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
> > > >
> > > > or with buffer
> > > >
> > > > BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
> > > >
> > > > I already created a Jira ticket.
> > > > https://issues.apache.org/jira/browse/IGNITE-5602
> > > >
> > > > --
> > > > Vladislav Pyatkov
> > > > Architect-Consultant "GridGain Rus" Llc.
> > > > +7 963 716 68 99
> > > >
> > >
> >
> >
> >
> > --
> > Vladislav Pyatkov
> > Architect-Consultant "GridGain Rus" Llc.
> > +7 963 716 68 99
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

dsetrakyan
On Fri, Jun 30, 2017 at 10:35 AM, Valentin Kulichenko <
[hidden email]> wrote:

> Vova,
>
> Generally this can be useful. If you have a read-only binary object with a
> large blob as a field, you don't want to copy this array when reading it.
> Instead, we can return a ByteBuffer or a stream wrapping the corresponding
> portion.
>
> However, I currently don't see how this can be smoothly added to existing
> API. Vlad, do you have any concrete proposal on how it should look like?
>

Huge +1 from me. We should not require a copy for read-only data. We should
give users an ability to get the original byte stream, especially if it is
immutable.


>
> -Val
>
> On Thu, Jun 29, 2017 at 2:11 PM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Hi Vlad,
> >
> > I am not quite sure I understand the problem. Can you show how the API
> you
> > propose would look like? Remember that "field" method can return anything
> > from primitive, String or byte array, to another BinaryObject. And
> returned
> > BinaryObject can have references outside of itself, so it cannot be
> > serialized easily without full rebuild. .
> >
> > On Thu, Jun 29, 2017 at 10:16 AM, Vladislav Pyatkov <
> [hidden email]
> > >
> > wrote:
> >
> > > Val,
> > >
> > > I proposal, access as a stream to binary object, because we have
> doubled
> > > copy on touch a field (first at copy from cache and second - on
> getting a
> > > field).
> > >
> > > For the stream in/out to cache I will be used IGFS.
> > > Main idea to avoid GC pressure when make a massive read from key-value
> > > storage.
> > >
> > > On Wed, Jun 28, 2017 at 9:36 PM, Valentin Kulichenko <
> > > [hidden email]> wrote:
> > >
> > > > Vladislav,
> > > >
> > > > Are you suggesting to stream directly from cache. or from a binary
> > object
> > > > that is already copied from cache?
> > > >
> > > > -Val
> > > >
> > > > On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Recently, from one of Ignite user, I listened interest idea.
> > > > > What if I want to pass some date to java stream from cache.
> > > > >
> > > > > With binary I do it like this:
> > > > >
> > > > > BinaryObject get = (BinaryObject) cache.get(key);
> > > > > byte[] dataFromCache = get.<byte[]>field("data");
> > > > > System.out.write(dataFromCache, 0, dataFromCache.length);
> > > > >
> > > > > But in this case we got garbage a lot, due to each time new bytes
> > array
> > > > is
> > > > > creating.
> > > > >
> > > > > This will lead to many GC events in case we load a some of million
> > > > entries.
> > > > > Could we offer additional API for working with java stream:
> > > > >
> > > > > BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
> > > > >
> > > > > or with buffer
> > > > >
> > > > > BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
> > > > >
> > > > > I already created a Jira ticket.
> > > > > https://issues.apache.org/jira/browse/IGNITE-5602
> > > > >
> > > > > --
> > > > > Vladislav Pyatkov
> > > > > Architect-Consultant "GridGain Rus" Llc.
> > > > > +7 963 716 68 99
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Vladislav Pyatkov
> > > Architect-Consultant "GridGain Rus" Llc.
> > > +7 963 716 68 99
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: By bytes access to binary format

Vladislav Pyatkov
Hi,

Somebody already prepared a variant of this API. Look at patch in JIRA.
I see that have some questions for the implementation, but how about the
idea.

We can to add getFieldStreamer to BinaryObject


*public interface BinaryObject {*
*...*

*/***
* * Get reusable BinaryFieldStreamer. Useful for data streaming.*
* **
* * @return new instance of BinaryFieldStreamer backed of this object.*
**/*
*public BinaryFieldStreamer getFieldStreamer();*

*...*
*}*

which returns BinaryFieldStreamer and allows access (as stream) to
particular field.

*public interface BinaryFieldStreamer {*

*/***
* * Get representation of this streamer as input stream.*
* **
* * @throws UnsupportedOperationException In case of the {feildName} does
not streamed.*
* * @return representation of this steamer as input stream*
**/*
*public InputStream asInputStream(String fieldName);*

*}*

We will be throw unsupported exception if do not know how to stream field,
otherwise return InputStream wrapper.

What will you think about this?


On Fri, Jun 30, 2017 at 8:38 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Fri, Jun 30, 2017 at 10:35 AM, Valentin Kulichenko <
> [hidden email]> wrote:
>
> > Vova,
> >
> > Generally this can be useful. If you have a read-only binary object with
> a
> > large blob as a field, you don't want to copy this array when reading it.
> > Instead, we can return a ByteBuffer or a stream wrapping the
> corresponding
> > portion.
> >
> > However, I currently don't see how this can be smoothly added to existing
> > API. Vlad, do you have any concrete proposal on how it should look like?
> >
>
> Huge +1 from me. We should not require a copy for read-only data. We should
> give users an ability to get the original byte stream, especially if it is
> immutable.
>
>
> >
> > -Val
> >
> > On Thu, Jun 29, 2017 at 2:11 PM, Vladimir Ozerov <[hidden email]>
> > wrote:
> >
> > > Hi Vlad,
> > >
> > > I am not quite sure I understand the problem. Can you show how the API
> > you
> > > propose would look like? Remember that "field" method can return
> anything
> > > from primitive, String or byte array, to another BinaryObject. And
> > returned
> > > BinaryObject can have references outside of itself, so it cannot be
> > > serialized easily without full rebuild. .
> > >
> > > On Thu, Jun 29, 2017 at 10:16 AM, Vladislav Pyatkov <
> > [hidden email]
> > > >
> > > wrote:
> > >
> > > > Val,
> > > >
> > > > I proposal, access as a stream to binary object, because we have
> > doubled
> > > > copy on touch a field (first at copy from cache and second - on
> > getting a
> > > > field).
> > > >
> > > > For the stream in/out to cache I will be used IGFS.
> > > > Main idea to avoid GC pressure when make a massive read from
> key-value
> > > > storage.
> > > >
> > > > On Wed, Jun 28, 2017 at 9:36 PM, Valentin Kulichenko <
> > > > [hidden email]> wrote:
> > > >
> > > > > Vladislav,
> > > > >
> > > > > Are you suggesting to stream directly from cache. or from a binary
> > > object
> > > > > that is already copied from cache?
> > > > >
> > > > > -Val
> > > > >
> > > > > On Wed, Jun 28, 2017 at 2:52 AM, Vladislav Pyatkov <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Recently, from one of Ignite user, I listened interest idea.
> > > > > > What if I want to pass some date to java stream from cache.
> > > > > >
> > > > > > With binary I do it like this:
> > > > > >
> > > > > > BinaryObject get = (BinaryObject) cache.get(key);
> > > > > > byte[] dataFromCache = get.<byte[]>field("data");
> > > > > > System.out.write(dataFromCache, 0, dataFromCache.length);
> > > > > >
> > > > > > But in this case we got garbage a lot, due to each time new bytes
> > > array
> > > > > is
> > > > > > creating.
> > > > > >
> > > > > > This will lead to many GC events in case we load a some of
> million
> > > > > entries.
> > > > > > Could we offer additional API for working with java stream:
> > > > > >
> > > > > > BinaryObject.writeBytesToBuf("data", ByteBuffer.allocate(1024));
> > > > > >
> > > > > > or with buffer
> > > > > >
> > > > > > BinaryObject.writeBytesToBuf("data", new byte[1000], 100);
> > > > > >
> > > > > > I already created a Jira ticket.
> > > > > > https://issues.apache.org/jira/browse/IGNITE-5602
> > > > > >
> > > > > > --
> > > > > > Vladislav Pyatkov
> > > > > > Architect-Consultant "GridGain Rus" Llc.
> > > > > > +7 963 716 68 99
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Vladislav Pyatkov
> > > > Architect-Consultant "GridGain Rus" Llc.
> > > > +7 963 716 68 99
> > > >
> > >
> >
>



--
Vladislav Pyatkov
Architect-Consultant "GridGain Rus" Llc.
+7-929-537-79-60