Apache Ignite Developers - Legacy Mail Archive

Cassandra store questions

Classic

List

Threaded

8 messages Options

Valentin Kulichenko

Cassandra store questions

Hi Igor,

I've got couple of quick questions about the Cassandra store.

1. In [1] you suggested to provide an explicit query as a parameter for
loadCache() method, because otherwise user was always getting empty result.
Is this a requirement to provide the query? What if I just call
loadCache(null)?
2. There is a ticket [2] about parallel load in Cassandra store. Does it
mean that currently it loads only in a single threaded fashion? If so, do
you have any plans to implement this improvement?

[1]
http://apache-ignite-users.70518.x6.nabble.com/Cannot-query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
[2] https://gridgain.freshdesk.com/helpdesk/tickets/2180

Thanks,
Val

irudyak

Re: Cassandra store questions

Hi Val,

1) If you'll call loadCache(null) it will do nothing. You need to provide
at least one CQL query.

2) It depends. If you'll provide more than one CQL query, it will use
separate thread for each of the queries (max number of threads limited to
the number of CPU cores). But for each provided CQL query it will use only
one thread to load all the data returned by the query. Also it will run the
same CQL query from ALL Ignite nodes to load the same data, which is bad.
That's because loadCache method will be executed on each Ignite node. As
you see, it's not very efficient way to load data from Cassandra just by
specifying CQL query. The ticket I created, is all about how to load data
from one table (or from multiple tables as well) in parallel by
partitioning it. Such a way each Ignite node will be responsible to load
data from the specific partition range of Cassandra table, which is much
more efficient. To support such kind of cache warm-up you should design
your Cassandra table specific way - there should be some mapping from
Ignite partition to the set of Cassandra partitions. Yes I have plans to
implement this.

Igor Rudyak

On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
[hidden email]> wrote:

> Hi Igor,
>
> I've got couple of quick questions about the Cassandra store.
>
> 1. In [1] you suggested to provide an explicit query as a parameter
> for loadCache() method, because otherwise user was always getting empty
> result. Is this a requirement to provide the query? What if I just call
> loadCache(null)?
> 2. There is a ticket [2] about parallel load in Cassandra store. Does
> it mean that currently it loads only in a single threaded fashion? If so,
> do you have any plans to implement this improvement?
>
> [1] http://apache-ignite-users.70518.x6.nabble.com/
> Cannot-query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>
> Thanks,
> Val
>

Valentin Kulichenko

Re: Cassandra store questions

Hi Igor,

Thanks for response!

1. It's a bit inconsistent with other store implementations we have in the
product and actually I find this counterintuitive. Why don't we just load
all the data available in the table? Explicit query is useful when you want
to customize this and load subset of data based on some criteria. If this
is not possible for some reason, then I would at least throw an exception
in case query is not specified.

2. Is it possible to automatically split the data in bulks and load them in
parallel? We do this in the JDBC store, for example.

-Val

On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]> wrote:

> Hi Val,
>
> 1) If you'll call loadCache(null) it will do nothing. You need to provide
> at least one CQL query.
>
> 2) It depends. If you'll provide more than one CQL query, it will use
> separate thread for each of the queries (max number of threads limited to
> the number of CPU cores). But for each provided CQL query it will use only
> one thread to load all the data returned by the query. Also it will run the
> same CQL query from ALL Ignite nodes to load the same data, which is bad.
> That's because loadCache method will be executed on each Ignite node. As
> you see, it's not very efficient way to load data from Cassandra just by
> specifying CQL query. The ticket I created, is all about how to load data
> from one table (or from multiple tables as well) in parallel by
> partitioning it. Such a way each Ignite node will be responsible to load
> data from the specific partition range of Cassandra table, which is much
> more efficient. To support such kind of cache warm-up you should design
> your Cassandra table specific way - there should be some mapping from
> Ignite partition to the set of Cassandra partitions. Yes I have plans to
> implement this.
>
> Igor Rudyak
>
>
> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
> [hidden email]> wrote:
>
>> Hi Igor,
>>
>> I've got couple of quick questions about the Cassandra store.
>>
>> 1. In [1] you suggested to provide an explicit query as a parameter
>> for loadCache() method, because otherwise user was always getting empty
>> result. Is this a requirement to provide the query? What if I just call
>> loadCache(null)?
>> 2. There is a ticket [2] about parallel load in Cassandra store. Does
>> it mean that currently it loads only in a single threaded fashion? If so,
>> do you have any plans to implement this improvement?
>>
>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-
>> query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>
>> Thanks,
>> Val
>>
>
>

irudyak

Re: Cassandra store questions

Hi Val,

1) Well, it's not a problem to implement such default behavior, but there
is one concern. In most cases, when you are using Cassandra as a persistent
store you are going to store large amount of data, which is significantly
bigger that amount of RAM in your Ignite cluster. In the such case it
doesn't make sense to launch CQL query like "select * from my_table" cause:
a) You still will not be able to keep all data from Cassandra table in
Ignite cache
b) All the data will be pulled from Cassandra table using only one
thread - which is very slow

2) Unfortunately it's not possible in Cassandra. For JDBC you are splitting
table into chunks of 512 rows each, using sub-queries and ordering by
primary keys. Such kind of things are not supported in Cassandra. Probably
the only way to load data from Cassandra table in parallel, is to load it
from some specified partitions (in parallel for each partition).

Igor Rudyak

On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
[hidden email]> wrote:

> Hi Igor,
>
> Thanks for response!
>
> 1. It's a bit inconsistent with other store implementations we have in the
> product and actually I find this counterintuitive. Why don't we just load
> all the data available in the table? Explicit query is useful when you want
> to customize this and load subset of data based on some criteria. If this
> is not possible for some reason, then I would at least throw an exception
> in case query is not specified.
>
> 2. Is it possible to automatically split the data in bulks and load them
> in parallel? We do this in the JDBC store, for example.
>
> -Val
>
> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]> wrote:
>
>> Hi Val,
>>
>> 1) If you'll call loadCache(null) it will do nothing. You need to provide
>> at least one CQL query.
>>
>> 2) It depends. If you'll provide more than one CQL query, it will use
>> separate thread for each of the queries (max number of threads limited to
>> the number of CPU cores). But for each provided CQL query it will use only
>> one thread to load all the data returned by the query. Also it will run the
>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>> That's because loadCache method will be executed on each Ignite node. As
>> you see, it's not very efficient way to load data from Cassandra just by
>> specifying CQL query. The ticket I created, is all about how to load data
>> from one table (or from multiple tables as well) in parallel by
>> partitioning it. Such a way each Ignite node will be responsible to load
>> data from the specific partition range of Cassandra table, which is much
>> more efficient. To support such kind of cache warm-up you should design
>> your Cassandra table specific way - there should be some mapping from
>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>> implement this.
>>
>> Igor Rudyak
>>
>>
>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>> [hidden email]> wrote:
>>
>>> Hi Igor,
>>>
>>> I've got couple of quick questions about the Cassandra store.
>>>
>>> 1. In [1] you suggested to provide an explicit query as a parameter
>>> for loadCache() method, because otherwise user was always getting empty
>>> result. Is this a requirement to provide the query? What if I just call
>>> loadCache(null)?
>>> 2. There is a ticket [2] about parallel load in Cassandra store.
>>> Does it mean that currently it loads only in a single threaded fashion? If
>>> so, do you have any plans to implement this improvement?
>>>
>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>
>>> Thanks,
>>> Val
>>>
>>
>>
>

Valentin Kulichenko

Re: Cassandra store questions

Hi Igor,

1. I still think we should do this. Loading nothing is very
counterintuitive and prevents a newbie user from quick start. For large
tables, when only part of the dataset is needed, user will explicitly
specify the query, of course. Do you have objections? If no, I will create
a ticket.

2. Got it, thanks.

-Val

On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[hidden email]> wrote:

> Hi Val,
>
> 1) Well, it's not a problem to implement such default behavior, but there
> is one concern. In most cases, when you are using Cassandra as a persistent
> store you are going to store large amount of data, which is significantly
> bigger that amount of RAM in your Ignite cluster. In the such case it
> doesn't make sense to launch CQL query like "select * from my_table" cause:
> a) You still will not be able to keep all data from Cassandra table in
> Ignite cache
> b) All the data will be pulled from Cassandra table using only one
> thread - which is very slow
>
> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
> splitting table into chunks of 512 rows each, using sub-queries and
> ordering by primary keys. Such kind of things are not supported in
> Cassandra. Probably the only way to load data from Cassandra table in
> parallel, is to load it from some specified partitions (in parallel for
> each partition).
>
>
> Igor Rudyak
>
> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
> [hidden email]> wrote:
>
>> Hi Igor,
>>
>> Thanks for response!
>>
>> 1. It's a bit inconsistent with other store implementations we have in
>> the product and actually I find this counterintuitive. Why don't we just
>> load all the data available in the table? Explicit query is useful when you
>> want to customize this and load subset of data based on some criteria. If
>> this is not possible for some reason, then I would at least throw an
>> exception in case query is not specified.
>>
>> 2. Is it possible to automatically split the data in bulks and load them
>> in parallel? We do this in the JDBC store, for example.
>>
>> -Val
>>
>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]> wrote:
>>
>>> Hi Val,
>>>
>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>> provide at least one CQL query.
>>>
>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>> separate thread for each of the queries (max number of threads limited to
>>> the number of CPU cores). But for each provided CQL query it will use only
>>> one thread to load all the data returned by the query. Also it will run the
>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>> That's because loadCache method will be executed on each Ignite node. As
>>> you see, it's not very efficient way to load data from Cassandra just by
>>> specifying CQL query. The ticket I created, is all about how to load data
>>> from one table (or from multiple tables as well) in parallel by
>>> partitioning it. Such a way each Ignite node will be responsible to load
>>> data from the specific partition range of Cassandra table, which is much
>>> more efficient. To support such kind of cache warm-up you should design
>>> your Cassandra table specific way - there should be some mapping from
>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>> implement this.
>>>
>>> Igor Rudyak
>>>
>>>
>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>> [hidden email]> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> I've got couple of quick questions about the Cassandra store.
>>>>
>>>> 1. In [1] you suggested to provide an explicit query as a parameter
>>>> for loadCache() method, because otherwise user was always getting empty
>>>> result. Is this a requirement to provide the query? What if I just call
>>>> loadCache(null)?
>>>> 2. There is a ticket [2] about parallel load in Cassandra store.
>>>> Does it mean that currently it loads only in a single threaded fashion? If
>>>> so, do you have any plans to implement this improvement?
>>>>
>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>
>>>> Thanks,
>>>> Val
>>>>
>>>
>>>
>>
>

irudyak

Re: Cassandra store questions

Hi Val,

I don't have any objections - please create a ticket and link it to the
root ticket https://issues.apache.org/jira/browse/IGNITE-1371

Igor

On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
[hidden email]> wrote:

> Hi Igor,
>
> 1. I still think we should do this. Loading nothing is very
> counterintuitive and prevents a newbie user from quick start. For large
> tables, when only part of the dataset is needed, user will explicitly
> specify the query, of course. Do you have objections? If no, I will create
> a ticket.
>
> 2. Got it, thanks.
>
> -Val
>
> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[hidden email]> wrote:
>
>> Hi Val,
>>
>> 1) Well, it's not a problem to implement such default behavior, but there
>> is one concern. In most cases, when you are using Cassandra as a persistent
>> store you are going to store large amount of data, which is significantly
>> bigger that amount of RAM in your Ignite cluster. In the such case it
>> doesn't make sense to launch CQL query like "select * from my_table" cause:
>> a) You still will not be able to keep all data from Cassandra table in
>> Ignite cache
>> b) All the data will be pulled from Cassandra table using only one
>> thread - which is very slow
>>
>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>> splitting table into chunks of 512 rows each, using sub-queries and
>> ordering by primary keys. Such kind of things are not supported in
>> Cassandra. Probably the only way to load data from Cassandra table in
>> parallel, is to load it from some specified partitions (in parallel for
>> each partition).
>>
>>
>> Igor Rudyak
>>
>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>> [hidden email]> wrote:
>>
>>> Hi Igor,
>>>
>>> Thanks for response!
>>>
>>> 1. It's a bit inconsistent with other store implementations we have in
>>> the product and actually I find this counterintuitive. Why don't we just
>>> load all the data available in the table? Explicit query is useful when you
>>> want to customize this and load subset of data based on some criteria. If
>>> this is not possible for some reason, then I would at least throw an
>>> exception in case query is not specified.
>>>
>>> 2. Is it possible to automatically split the data in bulks and load them
>>> in parallel? We do this in the JDBC store, for example.
>>>
>>> -Val
>>>
>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]> wrote:
>>>
>>>> Hi Val,
>>>>
>>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>>> provide at least one CQL query.
>>>>
>>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>>> separate thread for each of the queries (max number of threads limited to
>>>> the number of CPU cores). But for each provided CQL query it will use only
>>>> one thread to load all the data returned by the query. Also it will run the
>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>>> That's because loadCache method will be executed on each Ignite node. As
>>>> you see, it's not very efficient way to load data from Cassandra just by
>>>> specifying CQL query. The ticket I created, is all about how to load data
>>>> from one table (or from multiple tables as well) in parallel by
>>>> partitioning it. Such a way each Ignite node will be responsible to load
>>>> data from the specific partition range of Cassandra table, which is much
>>>> more efficient. To support such kind of cache warm-up you should design
>>>> your Cassandra table specific way - there should be some mapping from
>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>>> implement this.
>>>>
>>>> Igor Rudyak
>>>>
>>>>
>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>>> [hidden email]> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> I've got couple of quick questions about the Cassandra store.
>>>>>
>>>>> 1. In [1] you suggested to provide an explicit query as a
>>>>> parameter for loadCache() method, because otherwise user was always getting
>>>>> empty result. Is this a requirement to provide the query? What if I just
>>>>> call loadCache(null)?
>>>>> 2. There is a ticket [2] about parallel load in Cassandra store.
>>>>> Does it mean that currently it loads only in a single threaded fashion? If
>>>>> so, do you have any plans to implement this improvement?
>>>>>
>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>>
>>>>> Thanks,
>>>>> Val
>>>>>
>>>>
>>>>
>>>
>>
>

Valentin Kulichenko

Re: Cassandra store questions

Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075

-Val

On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak <[hidden email]> wrote:

> Hi Val,
>
> I don't have any objections - please create a ticket and link it to the
> root ticket https://issues.apache.org/jira/browse/IGNITE-1371
>
> Igor
>
> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
> [hidden email]> wrote:
>
>> Hi Igor,
>>
>> 1. I still think we should do this. Loading nothing is very
>> counterintuitive and prevents a newbie user from quick start. For large
>> tables, when only part of the dataset is needed, user will explicitly
>> specify the query, of course. Do you have objections? If no, I will create
>> a ticket.
>>
>> 2. Got it, thanks.
>>
>> -Val
>>
>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[hidden email]> wrote:
>>
>>> Hi Val,
>>>
>>> 1) Well, it's not a problem to implement such default behavior, but
>>> there is one concern. In most cases, when you are using Cassandra as a
>>> persistent store you are going to store large amount of data, which is
>>> significantly bigger that amount of RAM in your Ignite cluster. In the such
>>> case it doesn't make sense to launch CQL query like "select * from
>>> my_table" cause:
>>> a) You still will not be able to keep all data from Cassandra table
>>> in Ignite cache
>>> b) All the data will be pulled from Cassandra table using only one
>>> thread - which is very slow
>>>
>>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>>> splitting table into chunks of 512 rows each, using sub-queries and
>>> ordering by primary keys. Such kind of things are not supported in
>>> Cassandra. Probably the only way to load data from Cassandra table in
>>> parallel, is to load it from some specified partitions (in parallel for
>>> each partition).
>>>
>>>
>>> Igor Rudyak
>>>
>>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>>> [hidden email]> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> Thanks for response!
>>>>
>>>> 1. It's a bit inconsistent with other store implementations we have in
>>>> the product and actually I find this counterintuitive. Why don't we just
>>>> load all the data available in the table? Explicit query is useful when you
>>>> want to customize this and load subset of data based on some criteria. If
>>>> this is not possible for some reason, then I would at least throw an
>>>> exception in case query is not specified.
>>>>
>>>> 2. Is it possible to automatically split the data in bulks and load
>>>> them in parallel? We do this in the JDBC store, for example.
>>>>
>>>> -Val
>>>>
>>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]> wrote:
>>>>
>>>>> Hi Val,
>>>>>
>>>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>>>> provide at least one CQL query.
>>>>>
>>>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>>>> separate thread for each of the queries (max number of threads limited to
>>>>> the number of CPU cores). But for each provided CQL query it will use only
>>>>> one thread to load all the data returned by the query. Also it will run the
>>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>>>> That's because loadCache method will be executed on each Ignite node. As
>>>>> you see, it's not very efficient way to load data from Cassandra just by
>>>>> specifying CQL query. The ticket I created, is all about how to load data
>>>>> from one table (or from multiple tables as well) in parallel by
>>>>> partitioning it. Such a way each Ignite node will be responsible to load
>>>>> data from the specific partition range of Cassandra table, which is much
>>>>> more efficient. To support such kind of cache warm-up you should design
>>>>> your Cassandra table specific way - there should be some mapping from
>>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>>>> implement this.
>>>>>
>>>>> Igor Rudyak
>>>>>
>>>>>
>>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>>>> [hidden email]> wrote:
>>>>>
>>>>>> Hi Igor,
>>>>>>
>>>>>> I've got couple of quick questions about the Cassandra store.
>>>>>>
>>>>>> 1. In [1] you suggested to provide an explicit query as a
>>>>>> parameter for loadCache() method, because otherwise user was always getting
>>>>>> empty result. Is this a requirement to provide the query? What if I just
>>>>>> call loadCache(null)?
>>>>>> 2. There is a ticket [2] about parallel load in Cassandra store.
>>>>>> Does it mean that currently it loads only in a single threaded fashion? If
>>>>>> so, do you have any plans to implement this improvement?
>>>>>>
>>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>>>
>>>>>> Thanks,
>>>>>> Val
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

irudyak

Re: Cassandra store questions

Ok, thanks.

Igor

On Oct 13, 2016 4:37 PM, "Valentin Kulichenko" <
[hidden email]> wrote:

> Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075
>
> -Val
>
> On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak <[hidden email]> wrote:
>
>> Hi Val,
>>
>> I don't have any objections - please create a ticket and link it to the
>> root ticket https://issues.apache.org/jira/browse/IGNITE-1371
>>
>> Igor
>>
>> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
>> [hidden email]> wrote:
>>
>>> Hi Igor,
>>>
>>> 1. I still think we should do this. Loading nothing is very
>>> counterintuitive and prevents a newbie user from quick start. For large
>>> tables, when only part of the dataset is needed, user will explicitly
>>> specify the query, of course. Do you have objections? If no, I will create
>>> a ticket.
>>>
>>> 2. Got it, thanks.
>>>
>>> -Val
>>>
>>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[hidden email]> wrote:
>>>
>>>> Hi Val,
>>>>
>>>> 1) Well, it's not a problem to implement such default behavior, but
>>>> there is one concern. In most cases, when you are using Cassandra as a
>>>> persistent store you are going to store large amount of data, which is
>>>> significantly bigger that amount of RAM in your Ignite cluster. In the such
>>>> case it doesn't make sense to launch CQL query like "select * from
>>>> my_table" cause:
>>>> a) You still will not be able to keep all data from Cassandra table
>>>> in Ignite cache
>>>> b) All the data will be pulled from Cassandra table using only one
>>>> thread - which is very slow
>>>>
>>>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>>>> splitting table into chunks of 512 rows each, using sub-queries and
>>>> ordering by primary keys. Such kind of things are not supported in
>>>> Cassandra. Probably the only way to load data from Cassandra table in
>>>> parallel, is to load it from some specified partitions (in parallel for
>>>> each partition).
>>>>
>>>>
>>>> Igor Rudyak
>>>>
>>>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>>>> [hidden email]> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> Thanks for response!
>>>>>
>>>>> 1. It's a bit inconsistent with other store implementations we have in
>>>>> the product and actually I find this counterintuitive. Why don't we just
>>>>> load all the data available in the table? Explicit query is useful when you
>>>>> want to customize this and load subset of data based on some criteria. If
>>>>> this is not possible for some reason, then I would at least throw an
>>>>> exception in case query is not specified.
>>>>>
>>>>> 2. Is it possible to automatically split the data in bulks and load
>>>>> them in parallel? We do this in the JDBC store, for example.
>>>>>
>>>>> -Val
>>>>>
>>>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Val,
>>>>>>
>>>>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>>>>> provide at least one CQL query.
>>>>>>
>>>>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>>>>> separate thread for each of the queries (max number of threads limited to
>>>>>> the number of CPU cores). But for each provided CQL query it will use only
>>>>>> one thread to load all the data returned by the query. Also it will run the
>>>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>>>>> That's because loadCache method will be executed on each Ignite node. As
>>>>>> you see, it's not very efficient way to load data from Cassandra just by
>>>>>> specifying CQL query. The ticket I created, is all about how to load data
>>>>>> from one table (or from multiple tables as well) in parallel by
>>>>>> partitioning it. Such a way each Ignite node will be responsible to load
>>>>>> data from the specific partition range of Cassandra table, which is much
>>>>>> more efficient. To support such kind of cache warm-up you should design
>>>>>> your Cassandra table specific way - there should be some mapping from
>>>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>>>>> implement this.
>>>>>>
>>>>>> Igor Rudyak
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> I've got couple of quick questions about the Cassandra store.
>>>>>>>
>>>>>>> 1. In [1] you suggested to provide an explicit query as a
>>>>>>> parameter for loadCache() method, because otherwise user was always getting
>>>>>>> empty result. Is this a requirement to provide the query? What if I just
>>>>>>> call loadCache(null)?
>>>>>>> 2. There is a ticket [2] about parallel load in Cassandra store.
>>>>>>> Does it mean that currently it loads only in a single threaded fashion? If
>>>>>>> so, do you have any plans to implement this improvement?
>>>>>>>
>>>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Val
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>