Folks,
I am beta testing a PRAM model based parallel sorting algorithm and will integrate it with Ignite soon. My idea is to be able to use Ignite as an in memory sorting engine. Does anybody have any ideas around this especially around existing sorting functionalities? |
On Thu, Jul 09, 2015 at 10:49PM, Atri Sharma wrote:
> Folks, > > I am beta testing a PRAM model based parallel sorting algorithm and will > integrate it with Ignite soon. > > My idea is to be able to use Ignite as an in memory sorting engine. For my own education: what'd be the use case for such functionality? Thanks, Cos > Does anybody have any ideas around this especially around existing sorting > functionalities? |
Hi Cos,
Sorry I missed your email earlier. The use case around this is to have high speed sorting by maintaining in memory sorting for production servers. So I have a 512 GB RAM system which has to be able to sort efficiently but maintaining the stability and failover systems of the data. Do you see anything that I am missing here,please? On Fri, Jul 10, 2015 at 1:28 AM, Konstantin Boudnik <[hidden email]> wrote: > On Thu, Jul 09, 2015 at 10:49PM, Atri Sharma wrote: > > Folks, > > > > I am beta testing a PRAM model based parallel sorting algorithm and will > > integrate it with Ignite soon. > > > > My idea is to be able to use Ignite as an in memory sorting engine. > > For my own education: what'd be the use case for such functionality? > > Thanks, > Cos > > > Does anybody have any ideas around this especially around existing > sorting > > functionalities? > -- Regards, Atri *l'apprenant* |
Atri,
Is there any real world demand for this functionality? You know, throwing code in is easy but then this code needs to be maintained, its bad if this code is useful only for imaginary use cases. And to be honest currently I don't understand a practical purpose of what you are doing. Having said that you'd better clearly define on dev list goals and design of subsystem you are willing to implement so that committers can provide feedback as early as possible. Otherwise it may appear that you will waste your time on something that will not be accepted. Sergi 2015-07-13 9:41 GMT+03:00 Atri Sharma <[hidden email]>: > Hi Cos, > > Sorry I missed your email earlier. > > The use case around this is to have high speed sorting by maintaining in > memory sorting for production servers. So I have a 512 GB RAM system which > has to be able to sort efficiently but maintaining the stability and > failover systems of the data. > > Do you see anything that I am missing here,please? > > On Fri, Jul 10, 2015 at 1:28 AM, Konstantin Boudnik <[hidden email]> > wrote: > > > On Thu, Jul 09, 2015 at 10:49PM, Atri Sharma wrote: > > > Folks, > > > > > > I am beta testing a PRAM model based parallel sorting algorithm and > will > > > integrate it with Ignite soon. > > > > > > My idea is to be able to use Ignite as an in memory sorting engine. > > > > For my own education: what'd be the use case for such functionality? > > > > Thanks, > > Cos > > > > > Does anybody have any ideas around this especially around existing > > sorting > > > functionalities? > > > > > > -- > Regards, > > Atri > *l'apprenant* > |
Sergi,
I totally understand and appreciate your point. You are totally correct in your point that something not having definite use case should not be done. However, in this case, I think many use cases exist for in memory sorting. For eg, I am hacking on a side application which requires ordering multi regex as fast as possible in memory. In other case, streaming extracting data from HDFS and calculating file CDC for large data using multi sort method is something I had in mind as well. I think I did not come across clearly. The module I am talking about is currently a use case on top of Ignite and I was soliciting feedback if it is a good idea to integrate it with Ignite as a core module which can be offered as a functionality or used internally in case we plan to support features later on which require sorting (sort based aggregates for eg). Sorry for the ambiguity early on. I hope my idea is more clear now. On Mon, Jul 13, 2015 at 3:04 PM, Sergi Vladykin <[hidden email]> wrote: > Atri, > > Is there any real world demand for this functionality? > You know, throwing code in is easy but then this code needs to be > maintained, its bad if this code is useful only for imaginary use cases. > And to be honest currently I don't understand a practical purpose of what > you are doing. Having said that you'd better clearly define on dev list > goals and design of subsystem you are willing to implement so that > committers can provide feedback as early as possible. Otherwise it may > appear that you will waste your time on something that will not be > accepted. > > Sergi > > 2015-07-13 9:41 GMT+03:00 Atri Sharma <[hidden email]>: > > > Hi Cos, > > > > Sorry I missed your email earlier. > > > > The use case around this is to have high speed sorting by maintaining in > > memory sorting for production servers. So I have a 512 GB RAM system > which > > has to be able to sort efficiently but maintaining the stability and > > failover systems of the data. > > > > Do you see anything that I am missing here,please? > > > > On Fri, Jul 10, 2015 at 1:28 AM, Konstantin Boudnik <[hidden email]> > > wrote: > > > > > On Thu, Jul 09, 2015 at 10:49PM, Atri Sharma wrote: > > > > Folks, > > > > > > > > I am beta testing a PRAM model based parallel sorting algorithm and > > will > > > > integrate it with Ignite soon. > > > > > > > > My idea is to be able to use Ignite as an in memory sorting engine. > > > > > > For my own education: what'd be the use case for such functionality? > > > > > > Thanks, > > > Cos > > > > > > > Does anybody have any ideas around this especially around existing > > > sorting > > > > functionalities? > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > > -- Regards, Atri *l'apprenant* |
In reply to this post by Atri Sharma
On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> wrote:
> Hi Cos, > > Sorry I missed your email earlier. > > The use case around this is to have high speed sorting by maintaining in > memory sorting for production servers. So I have a 512 GB RAM system which > has to be able to sort efficiently but maintaining the stability and > failover systems of the data. > > Do you see anything that I am missing here,please? What's the interconnect for this system? Thanks, Roman. |
Hi Roman,
On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik <[hidden email]> wrote: > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> wrote: > > > What's the interconnect for this system? > Not sure I got what you meant here. -- Regards, Atri *l'apprenant* |
I would say that in case of a distributed algorithm complexity lies not
only in the number of input data, but also and, more, in the number of messages exchanged between nodes to achieve the result. I agree to maintain a certain principle of locality for related data, or leave this task to a system that already has a data model suitable to scale sorting ( RDBMS ? ). Regards, Gianfranco 2015-07-14 12:14 GMT+02:00 Atri Sharma <[hidden email]>: > Hi Roman, > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik <[hidden email]> > wrote: > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> > wrote: > > > > > > What's the interconnect for this system? > > > > Not sure I got what you meant here. > > > -- > Regards, > > Atri > *l'apprenant* > |
So, consider a relational database, like postgres. A major component of
sorting performance comes from the in memory sorting that happens for this case. Normally, something like an external sort would be used in conjugation with the disk files. However, a big data analytical production use case has this requirement that the available memory to postgres for sorting is pretty huge *but* so is the data and the response time has to be really fast and oh, the data has to be streamed from the database given certain events. So what I was thinking was on these lines: 1) Add a sorting module to the engine. 2) Allow the sorting module to get the data streamed through data streamers. 3) Give sorting module access to the cache. 4) Make a sort API which can be used by an external engine to chunk sort into ignite, using streamers to stream data and distribute sort across multiple threads, and give sorted results back. Note : This is actually more of a use case for Ignite. The reasons I proposed adding it to core were: 1) Since direct interaction with data streamer and cache is needed. 2) It would be a good use case demo. 3) It might allow Ignite to be used as a pure play sorting engine thus allowing existing databases to work with it. Thoughts? On Tue, Jul 14, 2015 at 4:49 PM, Gianfranco Murador < [hidden email]> wrote: > I would say that in case of a distributed algorithm complexity lies not > only in the number of input data, but also and, more, in the number of > messages exchanged between nodes to achieve the result. > I agree to maintain a certain principle of locality for related data, or > leave this task to a system that already has a data model suitable to > scale sorting ( RDBMS ? ). > Regards, > Gianfranco > > > 2015-07-14 12:14 GMT+02:00 Atri Sharma <[hidden email]>: > > > Hi Roman, > > > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik <[hidden email] > > > > wrote: > > > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> > > wrote: > > > > > > > > > What's the interconnect for this system? > > > > > > > Not sure I got what you meant here. > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > > -- Regards, Atri *l'apprenant* |
I believe that an "reduce" function is is appropriate for this type of task
and is generic enough to sort by any criteria. Maybe I'm wrong, but that's just my opinion. Regards, Gianfranco 2015-07-14 15:11 GMT+02:00 Atri Sharma <[hidden email]>: > So, consider a relational database, like postgres. A major component of > sorting performance comes from the in memory sorting that happens for this > case. Normally, something like an external sort would be used in > conjugation with the disk files. However, a big data analytical production > use case has this requirement that the available memory to postgres for > sorting is pretty huge *but* so is the data and the response time has to be > really fast and oh, the data has to be streamed from the database given > certain events. > > So what I was thinking was on these lines: > > 1) Add a sorting module to the engine. > 2) Allow the sorting module to get the data streamed through data > streamers. > 3) Give sorting module access to the cache. > 4) Make a sort API which can be used by an external engine to chunk sort > into ignite, using streamers to stream data and distribute sort across > multiple threads, and give sorted results back. > > Note : This is actually more of a use case for Ignite. The reasons I > proposed adding it to core were: 1) Since direct interaction with data > streamer and cache is needed. 2) It would be a good use case demo. 3) It > might allow Ignite to be used as a pure play sorting engine thus allowing > existing databases to work with it. > > Thoughts? > > On Tue, Jul 14, 2015 at 4:49 PM, Gianfranco Murador < > [hidden email]> wrote: > > > I would say that in case of a distributed algorithm complexity lies not > > only in the number of input data, but also and, more, in the number of > > messages exchanged between nodes to achieve the result. > > I agree to maintain a certain principle of locality for related data, or > > leave this task to a system that already has a data model suitable to > > scale sorting ( RDBMS ? ). > > Regards, > > Gianfranco > > > > > > 2015-07-14 12:14 GMT+02:00 Atri Sharma <[hidden email]>: > > > > > Hi Roman, > > > > > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik < > [hidden email] > > > > > > wrote: > > > > > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> > > > wrote: > > > > > > > > > > > > What's the interconnect for this system? > > > > > > > > > > Not sure I got what you meant here. > > > > > > > > > -- > > > Regards, > > > > > > Atri > > > *l'apprenant* > > > > > > > > > -- > Regards, > > Atri > *l'apprenant* > |
While I do agree with you in principle, I am not sure about the startup
costs and node transfer costs. This is pretty experimental so I might be re inventing the wheel :) On 14 Jul 2015 19:18, "Gianfranco Murador" <[hidden email]> wrote: > I believe that an "reduce" function is is appropriate for this type of task > and is generic enough to sort by any criteria. > Maybe I'm wrong, but that's just my opinion. > Regards, > Gianfranco > > 2015-07-14 15:11 GMT+02:00 Atri Sharma <[hidden email]>: > > > So, consider a relational database, like postgres. A major component of > > sorting performance comes from the in memory sorting that happens for > this > > case. Normally, something like an external sort would be used in > > conjugation with the disk files. However, a big data analytical > production > > use case has this requirement that the available memory to postgres for > > sorting is pretty huge *but* so is the data and the response time has to > be > > really fast and oh, the data has to be streamed from the database given > > certain events. > > > > So what I was thinking was on these lines: > > > > 1) Add a sorting module to the engine. > > 2) Allow the sorting module to get the data streamed through data > > streamers. > > 3) Give sorting module access to the cache. > > 4) Make a sort API which can be used by an external engine to chunk sort > > into ignite, using streamers to stream data and distribute sort across > > multiple threads, and give sorted results back. > > > > Note : This is actually more of a use case for Ignite. The reasons I > > proposed adding it to core were: 1) Since direct interaction with data > > streamer and cache is needed. 2) It would be a good use case demo. 3) It > > might allow Ignite to be used as a pure play sorting engine thus allowing > > existing databases to work with it. > > > > Thoughts? > > > > On Tue, Jul 14, 2015 at 4:49 PM, Gianfranco Murador < > > [hidden email]> wrote: > > > > > I would say that in case of a distributed algorithm complexity lies > not > > > only in the number of input data, but also and, more, in the number of > > > messages exchanged between nodes to achieve the result. > > > I agree to maintain a certain principle of locality for related data, > or > > > leave this task to a system that already has a data model suitable to > > > scale sorting ( RDBMS ? ). > > > Regards, > > > Gianfranco > > > > > > > > > 2015-07-14 12:14 GMT+02:00 Atri Sharma <[hidden email]>: > > > > > > > Hi Roman, > > > > > > > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik < > > [hidden email] > > > > > > > > wrote: > > > > > > > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email] > > > > > > wrote: > > > > > > > > > > > > > > > What's the interconnect for this system? > > > > > > > > > > > > > Not sure I got what you meant here. > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Atri > > > > *l'apprenant* > > > > > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > > |
In reply to this post by Atri Sharma
On Tue, Jul 14, 2015 at 3:14 AM, Atri Sharma <[hidden email]> wrote:
> Hi Roman, > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik <[hidden email]> > wrote: > >> On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[hidden email]> wrote: >> >> >> What's the interconnect for this system? >> > > Not sure I got what you meant here. How are the nodes connected physically (Ethernet, IB, etc.) and logically (TCP/IP, UDP, etc.) ? Thanks, Roman. |
Free forum by Nabble | Edit this page |