Igniters,
I'm looking at the question on SO [1] and I'm a bit confused. We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop JARs, assuming that user will include them from the Hadoop distribution he uses. It seems OK for me when accelerator is plugged in to Hadoop to run mapreduce jobs, but I can't figure out steps required to configure HDFS as a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user supposed to add them manually? Can someone with more expertise in our Hadoop integration clarify this? I believe there is not enough documentation on this topic. BTW, any ideas why user gets exception for JobConf class which is in 'mapred' package? Why map-reduce class is being used? [1] http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem -Val |
Hi, Valentin,
1) first of all note that the author of the question uses not the latest doc page, namely http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system . This is version 1.0, while the latest is 1.5: https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it appeared that some links from the latest doc version point to 1.0 doc version. I fixed that in several places where I found that. Do we really need old doc versions (1.0 -1.4)? 2) our documentation ( http://apacheignite.gridgain.org/docs/secondary-file-system) does not provide any special setup instructions to configure HDFS as secondary file system in Ignite. Our docs assume that if a user wants to integrate with Hadoop, (s)he follows generic Hadoop integration instruction (e.g. http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It looks like the page http://apacheignite.gridgain.org/docs/secondary-file-system should be more clear regarding the required configuration steps (in fact, setting up HADOOP_HOME variable for Ignite node process). 3) Hadoop jars are correctly found by Ignite if the following conditions are met: (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition). (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop distribution), or file "/etc/default/hadoop" exists and matches the Hadoop distribution used (BigTop, Cloudera, HDP, etc.) The exact mechanism of the Hadoop classpath composition can be found in files IGNITE_HOME/bin/include/hadoop-classpath.sh IGNITE_HOME/bin/include/setenv.sh . The issue is discussed in https://issues.apache.org/jira/browse/IGNITE-372 , https://issues.apache.org/jira/browse/IGNITE-483 . On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < [hidden email]> wrote: > Igniters, > > I'm looking at the question on SO [1] and I'm a bit confused. > > We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop > JARs, assuming that user will include them from the Hadoop distribution he > uses. It seems OK for me when accelerator is plugged in to Hadoop to run > mapreduce jobs, but I can't figure out steps required to configure HDFS as > a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user > supposed to add them manually? > > Can someone with more expertise in our Hadoop integration clarify this? I > believe there is not enough documentation on this topic. > > BTW, any ideas why user gets exception for JobConf class which is in > 'mapred' package? Why map-reduce class is being used? > > [1] > > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > > -Val > |
Hi Ivan,
1) Yes, I think that it makes sense to have the old versions of the docs while an old version is still considered to be used by someone. 2) Absolutely, the time to add a corresponding article on the readme.io has come. It's not the first time I see the question related to HDFS as a secondary FS. Before and now it's not clear for me what exact steps I should follow to enable such a configuration. Our current suggestions look like a puzzle. I'll assemble the puzzle on my side and prepare the article. Ivan if you don't mind I would reaching you out directly asking for any technical assistance if needed. Regards, Denis On 12/14/2015 10:25 AM, Ivan V. wrote: > Hi, Valentin, > > 1) first of all note that the author of the question uses not the latest > doc page, namely > http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system . > This is version 1.0, while the latest is 1.5: > https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it > appeared that some links from the latest doc version point to 1.0 doc > version. I fixed that in several places where I found that. Do we really > need old doc versions (1.0 -1.4)? > > 2) our documentation ( > http://apacheignite.gridgain.org/docs/secondary-file-system) does not > provide any special setup instructions to configure HDFS as secondary file > system in Ignite. Our docs assume that if a user wants to integrate with > Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It > looks like the page > http://apacheignite.gridgain.org/docs/secondary-file-system should be more > clear regarding the required configuration steps (in fact, setting up > HADOOP_HOME variable for Ignite node process). > > 3) Hadoop jars are correctly found by Ignite if the following conditions > are met: > (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition). > (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop > distribution), or file "/etc/default/hadoop" exists and matches the Hadoop > distribution used (BigTop, Cloudera, HDP, etc.) > > The exact mechanism of the Hadoop classpath composition can be found in > files > IGNITE_HOME/bin/include/hadoop-classpath.sh > IGNITE_HOME/bin/include/setenv.sh . > > The issue is discussed in https://issues.apache.org/jira/browse/IGNITE-372 > , https://issues.apache.org/jira/browse/IGNITE-483 . > > On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > [hidden email]> wrote: > >> Igniters, >> >> I'm looking at the question on SO [1] and I'm a bit confused. >> >> We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop >> JARs, assuming that user will include them from the Hadoop distribution he >> uses. It seems OK for me when accelerator is plugged in to Hadoop to run >> mapreduce jobs, but I can't figure out steps required to configure HDFS as >> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user >> supposed to add them manually? >> >> Can someone with more expertise in our Hadoop integration clarify this? I >> believe there is not enough documentation on this topic. >> >> BTW, any ideas why user gets exception for JobConf class which is in >> 'mapred' package? Why map-reduce class is being used? >> >> [1] >> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >> >> -Val >> |
Guys,
Why don't we include ignite-hadoop module in Fabric? This user simply wants to configure HDFS as a secondary file system to ensure persistence. Not having the opportunity to do this in Fabric looks weird to me. And actually I don't think this is a use case for Hadoop Accelerator. -Val On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote: > Hi Ivan, > > 1) Yes, I think that it makes sense to have the old versions of the docs > while an old version is still considered to be used by someone. > > 2) Absolutely, the time to add a corresponding article on the readme.io > has come. It's not the first time I see the question related to HDFS as a > secondary FS. > Before and now it's not clear for me what exact steps I should follow to > enable such a configuration. Our current suggestions look like a puzzle. > I'll assemble the puzzle on my side and prepare the article. Ivan if you > don't mind I would reaching you out directly asking for any technical > assistance if needed. > > Regards, > Denis > > > On 12/14/2015 10:25 AM, Ivan V. wrote: > >> Hi, Valentin, >> >> 1) first of all note that the author of the question uses not the latest >> doc page, namely >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system . >> This is version 1.0, while the latest is 1.5: >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it >> appeared that some links from the latest doc version point to 1.0 doc >> version. I fixed that in several places where I found that. Do we really >> need old doc versions (1.0 -1.4)? >> >> 2) our documentation ( >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not >> provide any special setup instructions to configure HDFS as secondary file >> system in Ignite. Our docs assume that if a user wants to integrate with >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It >> looks like the page >> http://apacheignite.gridgain.org/docs/secondary-file-system should be >> more >> clear regarding the required configuration steps (in fact, setting up >> HADOOP_HOME variable for Ignite node process). >> >> 3) Hadoop jars are correctly found by Ignite if the following conditions >> are met: >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition). >> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop >> distribution), or file "/etc/default/hadoop" exists and matches the Hadoop >> distribution used (BigTop, Cloudera, HDP, etc.) >> >> The exact mechanism of the Hadoop classpath composition can be found in >> files >> IGNITE_HOME/bin/include/hadoop-classpath.sh >> IGNITE_HOME/bin/include/setenv.sh . >> >> The issue is discussed in >> https://issues.apache.org/jira/browse/IGNITE-372 >> , https://issues.apache.org/jira/browse/IGNITE-483 . >> >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >> [hidden email]> wrote: >> >> Igniters, >>> >>> I'm looking at the question on SO [1] and I'm a bit confused. >>> >>> We ship ignite-hadoop module only in Hadoop Accelerator and without >>> Hadoop >>> JARs, assuming that user will include them from the Hadoop distribution >>> he >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to run >>> mapreduce jobs, but I can't figure out steps required to configure HDFS >>> as >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is >>> user >>> supposed to add them manually? >>> >>> Can someone with more expertise in our Hadoop integration clarify this? I >>> believe there is not enough documentation on this topic. >>> >>> BTW, any ideas why user gets exception for JobConf class which is in >>> 'mapred' package? Why map-reduce class is being used? >>> >>> [1] >>> >>> >>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>> >>> -Val >>> >>> > |
Valya,
Because we decide whether to load Hadoop module based on its availability in classpath. And when Hadoop module is loaded, certain restrictions are applied to configuration, e.g. peerClassLoadingEnabled must be false. All this looks very inconvenient for me, but this is how things currently work. Vladimir. On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < [hidden email]> wrote: > Guys, > > Why don't we include ignite-hadoop module in Fabric? This user simply wants > to configure HDFS as a secondary file system to ensure persistence. Not > having the opportunity to do this in Fabric looks weird to me. And actually > I don't think this is a use case for Hadoop Accelerator. > > -Val > > On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote: > > > Hi Ivan, > > > > 1) Yes, I think that it makes sense to have the old versions of the docs > > while an old version is still considered to be used by someone. > > > > 2) Absolutely, the time to add a corresponding article on the readme.io > > has come. It's not the first time I see the question related to HDFS as a > > secondary FS. > > Before and now it's not clear for me what exact steps I should follow to > > enable such a configuration. Our current suggestions look like a puzzle. > > I'll assemble the puzzle on my side and prepare the article. Ivan if you > > don't mind I would reaching you out directly asking for any technical > > assistance if needed. > > > > Regards, > > Denis > > > > > > On 12/14/2015 10:25 AM, Ivan V. wrote: > > > >> Hi, Valentin, > >> > >> 1) first of all note that the author of the question uses not the latest > >> doc page, namely > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system . > >> This is version 1.0, while the latest is 1.5: > >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it > >> appeared that some links from the latest doc version point to 1.0 doc > >> version. I fixed that in several places where I found that. Do we really > >> need old doc versions (1.0 -1.4)? > >> > >> 2) our documentation ( > >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not > >> provide any special setup instructions to configure HDFS as secondary > file > >> system in Ignite. Our docs assume that if a user wants to integrate with > >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It > >> looks like the page > >> http://apacheignite.gridgain.org/docs/secondary-file-system should be > >> more > >> clear regarding the required configuration steps (in fact, setting up > >> HADOOP_HOME variable for Ignite node process). > >> > >> 3) Hadoop jars are correctly found by Ignite if the following conditions > >> are met: > >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition). > >> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop > >> distribution), or file "/etc/default/hadoop" exists and matches the > Hadoop > >> distribution used (BigTop, Cloudera, HDP, etc.) > >> > >> The exact mechanism of the Hadoop classpath composition can be found in > >> files > >> IGNITE_HOME/bin/include/hadoop-classpath.sh > >> IGNITE_HOME/bin/include/setenv.sh . > >> > >> The issue is discussed in > >> https://issues.apache.org/jira/browse/IGNITE-372 > >> , https://issues.apache.org/jira/browse/IGNITE-483 . > >> > >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >> [hidden email]> wrote: > >> > >> Igniters, > >>> > >>> I'm looking at the question on SO [1] and I'm a bit confused. > >>> > >>> We ship ignite-hadoop module only in Hadoop Accelerator and without > >>> Hadoop > >>> JARs, assuming that user will include them from the Hadoop distribution > >>> he > >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to > run > >>> mapreduce jobs, but I can't figure out steps required to configure HDFS > >>> as > >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is > >>> user > >>> supposed to add them manually? > >>> > >>> Can someone with more expertise in our Hadoop integration clarify > this? I > >>> believe there is not enough documentation on this topic. > >>> > >>> BTW, any ideas why user gets exception for JobConf class which is in > >>> 'mapred' package? Why map-reduce class is being used? > >>> > >>> [1] > >>> > >>> > >>> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>> > >>> -Val > >>> > >>> > > > |
In reply to this post by Valentin Kulichenko
To enable just an IGFS persistence there is no need to use HDFS (this
requires Hadoop dependency, requires configured HDFS cluster, etc.). We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , https://issues.apache.org/jira/browse/IGNITE-1926 to implement the persistence upon local file system, and we already close to the solution. Regarding the secondary Fs doc page ( http://apacheignite.gridgain.org/docs/secondary-file-system) I would suggest to add the following text there: ------------------------ If Ignite node with secondary file system configured on a machine with Hadoop distribution, make sure Ignite is able to find appropriate Hadoop libraries: set HADOOP_HOME environment variable for the Ignite process if you're using Apache Hadoop distribution, or, if you use another distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop file exists and has appropriate contents. If Ignite node with secondary file system configured on a machine without Hadoop distribution, you can manually add necessary Hadoop dependencies to Ignite node classpath: these are dependencies of groupId "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they are: 1. hadoop-annotations 2. hadoop-auth 3. hadoop-common 4. hadoop-hdfs 5. hadoop-mapreduce-client-common 6. hadoop-mapreduce-client-core ------------------------ On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < [hidden email]> wrote: > Guys, > > Why don't we include ignite-hadoop module in Fabric? This user simply wants > to configure HDFS as a secondary file system to ensure persistence. Not > having the opportunity to do this in Fabric looks weird to me. And actually > I don't think this is a use case for Hadoop Accelerator. > > -Val > > On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote: > > > Hi Ivan, > > > > 1) Yes, I think that it makes sense to have the old versions of the docs > > while an old version is still considered to be used by someone. > > > > 2) Absolutely, the time to add a corresponding article on the readme.io > > has come. It's not the first time I see the question related to HDFS as a > > secondary FS. > > Before and now it's not clear for me what exact steps I should follow to > > enable such a configuration. Our current suggestions look like a puzzle. > > I'll assemble the puzzle on my side and prepare the article. Ivan if you > > don't mind I would reaching you out directly asking for any technical > > assistance if needed. > > > > Regards, > > Denis > > > > > > On 12/14/2015 10:25 AM, Ivan V. wrote: > > > >> Hi, Valentin, > >> > >> 1) first of all note that the author of the question uses not the latest > >> doc page, namely > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system . > >> This is version 1.0, while the latest is 1.5: > >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it > >> appeared that some links from the latest doc version point to 1.0 doc > >> version. I fixed that in several places where I found that. Do we really > >> need old doc versions (1.0 -1.4)? > >> > >> 2) our documentation ( > >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not > >> provide any special setup instructions to configure HDFS as secondary > file > >> system in Ignite. Our docs assume that if a user wants to integrate with > >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It > >> looks like the page > >> http://apacheignite.gridgain.org/docs/secondary-file-system should be > >> more > >> clear regarding the required configuration steps (in fact, setting up > >> HADOOP_HOME variable for Ignite node process). > >> > >> 3) Hadoop jars are correctly found by Ignite if the following conditions > >> are met: > >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition). > >> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop > >> distribution), or file "/etc/default/hadoop" exists and matches the > Hadoop > >> distribution used (BigTop, Cloudera, HDP, etc.) > >> > >> The exact mechanism of the Hadoop classpath composition can be found in > >> files > >> IGNITE_HOME/bin/include/hadoop-classpath.sh > >> IGNITE_HOME/bin/include/setenv.sh . > >> > >> The issue is discussed in > >> https://issues.apache.org/jira/browse/IGNITE-372 > >> , https://issues.apache.org/jira/browse/IGNITE-483 . > >> > >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >> [hidden email]> wrote: > >> > >> Igniters, > >>> > >>> I'm looking at the question on SO [1] and I'm a bit confused. > >>> > >>> We ship ignite-hadoop module only in Hadoop Accelerator and without > >>> Hadoop > >>> JARs, assuming that user will include them from the Hadoop distribution > >>> he > >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to > run > >>> mapreduce jobs, but I can't figure out steps required to configure HDFS > >>> as > >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is > >>> user > >>> supposed to add them manually? > >>> > >>> Can someone with more expertise in our Hadoop integration clarify > this? I > >>> believe there is not enough documentation on this topic. > >>> > >>> BTW, any ideas why user gets exception for JobConf class which is in > >>> 'mapred' package? Why map-reduce class is being used? > >>> > >>> [1] > >>> > >>> > >>> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>> > >>> -Val > >>> > >>> > > > |
Ivan, I think this should be documented, no?
On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> wrote: > To enable just an IGFS persistence there is no need to use HDFS (this > requires Hadoop dependency, requires configured HDFS cluster, etc.). > We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , > https://issues.apache.org/jira/browse/IGNITE-1926 to implement the > persistence upon local file system, and we already close to the solution. > > Regarding the secondary Fs doc page ( > http://apacheignite.gridgain.org/docs/secondary-file-system) I would > suggest to add the following text there: > ------------------------ > If Ignite node with secondary file system configured on a machine with > Hadoop distribution, make sure Ignite is able to find appropriate Hadoop > libraries: set HADOOP_HOME environment variable for the Ignite process if > you're using Apache Hadoop distribution, or, if you use another > distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop > file exists and has appropriate contents. > > If Ignite node with secondary file system configured on a machine without > Hadoop distribution, you can manually add necessary Hadoop dependencies to > Ignite node classpath: these are dependencies of groupId > "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they > are: > > 1. hadoop-annotations > 2. hadoop-auth > 3. hadoop-common > 4. hadoop-hdfs > 5. hadoop-mapreduce-client-common > 6. hadoop-mapreduce-client-core > > ------------------------ > > On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < > [hidden email]> wrote: > > > Guys, > > > > Why don't we include ignite-hadoop module in Fabric? This user simply > wants > > to configure HDFS as a secondary file system to ensure persistence. Not > > having the opportunity to do this in Fabric looks weird to me. And > actually > > I don't think this is a use case for Hadoop Accelerator. > > > > -Val > > > > On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> > wrote: > > > > > Hi Ivan, > > > > > > 1) Yes, I think that it makes sense to have the old versions of the > docs > > > while an old version is still considered to be used by someone. > > > > > > 2) Absolutely, the time to add a corresponding article on the > readme.io > > > has come. It's not the first time I see the question related to HDFS > as a > > > secondary FS. > > > Before and now it's not clear for me what exact steps I should follow > to > > > enable such a configuration. Our current suggestions look like a > puzzle. > > > I'll assemble the puzzle on my side and prepare the article. Ivan if > you > > > don't mind I would reaching you out directly asking for any technical > > > assistance if needed. > > > > > > Regards, > > > Denis > > > > > > > > > On 12/14/2015 10:25 AM, Ivan V. wrote: > > > > > >> Hi, Valentin, > > >> > > >> 1) first of all note that the author of the question uses not the > latest > > >> doc page, namely > > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system > . > > >> This is version 1.0, while the latest is 1.5: > > >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it > > >> appeared that some links from the latest doc version point to 1.0 doc > > >> version. I fixed that in several places where I found that. Do we > really > > >> need old doc versions (1.0 -1.4)? > > >> > > >> 2) our documentation ( > > >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not > > >> provide any special setup instructions to configure HDFS as secondary > > file > > >> system in Ignite. Our docs assume that if a user wants to integrate > with > > >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > > >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). > It > > >> looks like the page > > >> http://apacheignite.gridgain.org/docs/secondary-file-system should be > > >> more > > >> clear regarding the required configuration steps (in fact, setting up > > >> HADOOP_HOME variable for Ignite node process). > > >> > > >> 3) Hadoop jars are correctly found by Ignite if the following > conditions > > >> are met: > > >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" > edition). > > >> (b) Either HADOOP_HOME environment variable is set up (for Apache > Hadoop > > >> distribution), or file "/etc/default/hadoop" exists and matches the > > Hadoop > > >> distribution used (BigTop, Cloudera, HDP, etc.) > > >> > > >> The exact mechanism of the Hadoop classpath composition can be found > in > > >> files > > >> IGNITE_HOME/bin/include/hadoop-classpath.sh > > >> IGNITE_HOME/bin/include/setenv.sh . > > >> > > >> The issue is discussed in > > >> https://issues.apache.org/jira/browse/IGNITE-372 > > >> , https://issues.apache.org/jira/browse/IGNITE-483 . > > >> > > >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > > >> [hidden email]> wrote: > > >> > > >> Igniters, > > >>> > > >>> I'm looking at the question on SO [1] and I'm a bit confused. > > >>> > > >>> We ship ignite-hadoop module only in Hadoop Accelerator and without > > >>> Hadoop > > >>> JARs, assuming that user will include them from the Hadoop > distribution > > >>> he > > >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to > > run > > >>> mapreduce jobs, but I can't figure out steps required to configure > HDFS > > >>> as > > >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is > > >>> user > > >>> supposed to add them manually? > > >>> > > >>> Can someone with more expertise in our Hadoop integration clarify > > this? I > > >>> believe there is not enough documentation on this topic. > > >>> > > >>> BTW, any ideas why user gets exception for JobConf class which is in > > >>> 'mapred' package? Why map-reduce class is being used? > > >>> > > >>> [1] > > >>> > > >>> > > >>> > > > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > > >>> > > >>> -Val > > >>> > > >>> > > > > > > |
Yes, this will be documented tomorrow. I want to go though all the steps by myself checking all other possible obstacles the user may face with.
— Denis > On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]> wrote: > > Ivan, I think this should be documented, no? > > On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> wrote: > >> To enable just an IGFS persistence there is no need to use HDFS (this >> requires Hadoop dependency, requires configured HDFS cluster, etc.). >> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , >> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the >> persistence upon local file system, and we already close to the solution. >> >> Regarding the secondary Fs doc page ( >> http://apacheignite.gridgain.org/docs/secondary-file-system) I would >> suggest to add the following text there: >> ------------------------ >> If Ignite node with secondary file system configured on a machine with >> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop >> libraries: set HADOOP_HOME environment variable for the Ignite process if >> you're using Apache Hadoop distribution, or, if you use another >> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop >> file exists and has appropriate contents. >> >> If Ignite node with secondary file system configured on a machine without >> Hadoop distribution, you can manually add necessary Hadoop dependencies to >> Ignite node classpath: these are dependencies of groupId >> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they >> are: >> >> 1. hadoop-annotations >> 2. hadoop-auth >> 3. hadoop-common >> 4. hadoop-hdfs >> 5. hadoop-mapreduce-client-common >> 6. hadoop-mapreduce-client-core >> >> ------------------------ >> >> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < >> [hidden email]> wrote: >> >>> Guys, >>> >>> Why don't we include ignite-hadoop module in Fabric? This user simply >> wants >>> to configure HDFS as a secondary file system to ensure persistence. Not >>> having the opportunity to do this in Fabric looks weird to me. And >> actually >>> I don't think this is a use case for Hadoop Accelerator. >>> >>> -Val >>> >>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> >> wrote: >>> >>>> Hi Ivan, >>>> >>>> 1) Yes, I think that it makes sense to have the old versions of the >> docs >>>> while an old version is still considered to be used by someone. >>>> >>>> 2) Absolutely, the time to add a corresponding article on the >> readme.io >>>> has come. It's not the first time I see the question related to HDFS >> as a >>>> secondary FS. >>>> Before and now it's not clear for me what exact steps I should follow >> to >>>> enable such a configuration. Our current suggestions look like a >> puzzle. >>>> I'll assemble the puzzle on my side and prepare the article. Ivan if >> you >>>> don't mind I would reaching you out directly asking for any technical >>>> assistance if needed. >>>> >>>> Regards, >>>> Denis >>>> >>>> >>>> On 12/14/2015 10:25 AM, Ivan V. wrote: >>>> >>>>> Hi, Valentin, >>>>> >>>>> 1) first of all note that the author of the question uses not the >> latest >>>>> doc page, namely >>>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system >> . >>>>> This is version 1.0, while the latest is 1.5: >>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it >>>>> appeared that some links from the latest doc version point to 1.0 doc >>>>> version. I fixed that in several places where I found that. Do we >> really >>>>> need old doc versions (1.0 -1.4)? >>>>> >>>>> 2) our documentation ( >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does not >>>>> provide any special setup instructions to configure HDFS as secondary >>> file >>>>> system in Ignite. Our docs assume that if a user wants to integrate >> with >>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). >> It >>>>> looks like the page >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should be >>>>> more >>>>> clear regarding the required configuration steps (in fact, setting up >>>>> HADOOP_HOME variable for Ignite node process). >>>>> >>>>> 3) Hadoop jars are correctly found by Ignite if the following >> conditions >>>>> are met: >>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" >> edition). >>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache >> Hadoop >>>>> distribution), or file "/etc/default/hadoop" exists and matches the >>> Hadoop >>>>> distribution used (BigTop, Cloudera, HDP, etc.) >>>>> >>>>> The exact mechanism of the Hadoop classpath composition can be found >> in >>>>> files >>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh >>>>> IGNITE_HOME/bin/include/setenv.sh . >>>>> >>>>> The issue is discussed in >>>>> https://issues.apache.org/jira/browse/IGNITE-372 >>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . >>>>> >>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >>>>> [hidden email]> wrote: >>>>> >>>>> Igniters, >>>>>> >>>>>> I'm looking at the question on SO [1] and I'm a bit confused. >>>>>> >>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without >>>>>> Hadoop >>>>>> JARs, assuming that user will include them from the Hadoop >> distribution >>>>>> he >>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to >>> run >>>>>> mapreduce jobs, but I can't figure out steps required to configure >> HDFS >>>>>> as >>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is >>>>>> user >>>>>> supposed to add them manually? >>>>>> >>>>>> Can someone with more expertise in our Hadoop integration clarify >>> this? I >>>>>> believe there is not enough documentation on this topic. >>>>>> >>>>>> BTW, any ideas why user gets exception for JobConf class which is in >>>>>> 'mapred' package? Why map-reduce class is being used? >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>>> >>> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>>>>> >>>>>> -Val >>>>>> >>>>>> >>>> >>> >> |
On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> wrote:
> Yes, this will be documented tomorrow. I want to go though all the steps > by myself checking all other possible obstacles the user may face with. > Thanks, Denis! > > — > Denis > > > On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]> > wrote: > > > > Ivan, I think this should be documented, no? > > > > On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> > wrote: > > > >> To enable just an IGFS persistence there is no need to use HDFS (this > >> requires Hadoop dependency, requires configured HDFS cluster, etc.). > >> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , > >> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the > >> persistence upon local file system, and we already close to the > solution. > >> > >> Regarding the secondary Fs doc page ( > >> http://apacheignite.gridgain.org/docs/secondary-file-system) I would > >> suggest to add the following text there: > >> ------------------------ > >> If Ignite node with secondary file system configured on a machine with > >> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop > >> libraries: set HADOOP_HOME environment variable for the Ignite process > if > >> you're using Apache Hadoop distribution, or, if you use another > >> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop > >> file exists and has appropriate contents. > >> > >> If Ignite node with secondary file system configured on a machine > without > >> Hadoop distribution, you can manually add necessary Hadoop dependencies > to > >> Ignite node classpath: these are dependencies of groupId > >> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently > they > >> are: > >> > >> 1. hadoop-annotations > >> 2. hadoop-auth > >> 3. hadoop-common > >> 4. hadoop-hdfs > >> 5. hadoop-mapreduce-client-common > >> 6. hadoop-mapreduce-client-core > >> > >> ------------------------ > >> > >> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < > >> [hidden email]> wrote: > >> > >>> Guys, > >>> > >>> Why don't we include ignite-hadoop module in Fabric? This user simply > >> wants > >>> to configure HDFS as a secondary file system to ensure persistence. Not > >>> having the opportunity to do this in Fabric looks weird to me. And > >> actually > >>> I don't think this is a use case for Hadoop Accelerator. > >>> > >>> -Val > >>> > >>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> > >> wrote: > >>> > >>>> Hi Ivan, > >>>> > >>>> 1) Yes, I think that it makes sense to have the old versions of the > >> docs > >>>> while an old version is still considered to be used by someone. > >>>> > >>>> 2) Absolutely, the time to add a corresponding article on the > >> readme.io > >>>> has come. It's not the first time I see the question related to HDFS > >> as a > >>>> secondary FS. > >>>> Before and now it's not clear for me what exact steps I should follow > >> to > >>>> enable such a configuration. Our current suggestions look like a > >> puzzle. > >>>> I'll assemble the puzzle on my side and prepare the article. Ivan if > >> you > >>>> don't mind I would reaching you out directly asking for any technical > >>>> assistance if needed. > >>>> > >>>> Regards, > >>>> Denis > >>>> > >>>> > >>>> On 12/14/2015 10:25 AM, Ivan V. wrote: > >>>> > >>>>> Hi, Valentin, > >>>>> > >>>>> 1) first of all note that the author of the question uses not the > >> latest > >>>>> doc page, namely > >>>>> > http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system > >> . > >>>>> This is version 1.0, while the latest is 1.5: > >>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it > >>>>> appeared that some links from the latest doc version point to 1.0 doc > >>>>> version. I fixed that in several places where I found that. Do we > >> really > >>>>> need old doc versions (1.0 -1.4)? > >>>>> > >>>>> 2) our documentation ( > >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does > not > >>>>> provide any special setup instructions to configure HDFS as secondary > >>> file > >>>>> system in Ignite. Our docs assume that if a user wants to integrate > >> with > >>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > >>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). > >> It > >>>>> looks like the page > >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should > be > >>>>> more > >>>>> clear regarding the required configuration steps (in fact, setting up > >>>>> HADOOP_HOME variable for Ignite node process). > >>>>> > >>>>> 3) Hadoop jars are correctly found by Ignite if the following > >> conditions > >>>>> are met: > >>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" > >> edition). > >>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache > >> Hadoop > >>>>> distribution), or file "/etc/default/hadoop" exists and matches the > >>> Hadoop > >>>>> distribution used (BigTop, Cloudera, HDP, etc.) > >>>>> > >>>>> The exact mechanism of the Hadoop classpath composition can be found > >> in > >>>>> files > >>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh > >>>>> IGNITE_HOME/bin/include/setenv.sh . > >>>>> > >>>>> The issue is discussed in > >>>>> https://issues.apache.org/jira/browse/IGNITE-372 > >>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . > >>>>> > >>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >>>>> [hidden email]> wrote: > >>>>> > >>>>> Igniters, > >>>>>> > >>>>>> I'm looking at the question on SO [1] and I'm a bit confused. > >>>>>> > >>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without > >>>>>> Hadoop > >>>>>> JARs, assuming that user will include them from the Hadoop > >> distribution > >>>>>> he > >>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to > >>> run > >>>>>> mapreduce jobs, but I can't figure out steps required to configure > >> HDFS > >>>>>> as > >>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? > Is > >>>>>> user > >>>>>> supposed to add them manually? > >>>>>> > >>>>>> Can someone with more expertise in our Hadoop integration clarify > >>> this? I > >>>>>> believe there is not enough documentation on this topic. > >>>>>> > >>>>>> BTW, any ideas why user gets exception for JobConf class which is in > >>>>>> 'mapred' package? Why map-reduce class is being used? > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >>>>>> > >>> > >> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> > >>>> > >>> > >> > > |
Ivan,
Is there any reason why we don’t recommend using apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop Accelerator articles? With setup-hadoop.sh I was able to build a valid classpath, create symlinks to the accelerator's jars from hadoop’s libs folder automatically and started an Ignite node that uses HDFS as a secondary FS in less than 10 minutes. I just followed the instructions from apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the readme.io <http://readme.io/> look much more complex for me, they don’t mention setup-hadoop.sh/bat at all making the end user to perform a manual setup. — Denis > On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]> wrote: > > On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> wrote: > >> Yes, this will be documented tomorrow. I want to go though all the steps >> by myself checking all other possible obstacles the user may face with. >> > > Thanks, Denis! > > >> >> — >> Denis >> >>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]> >> wrote: >>> >>> Ivan, I think this should be documented, no? >>> >>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> >> wrote: >>> >>>> To enable just an IGFS persistence there is no need to use HDFS (this >>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). >>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , >>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the >>>> persistence upon local file system, and we already close to the >> solution. >>>> >>>> Regarding the secondary Fs doc page ( >>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would >>>> suggest to add the following text there: >>>> ------------------------ >>>> If Ignite node with secondary file system configured on a machine with >>>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop >>>> libraries: set HADOOP_HOME environment variable for the Ignite process >> if >>>> you're using Apache Hadoop distribution, or, if you use another >>>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop >>>> file exists and has appropriate contents. >>>> >>>> If Ignite node with secondary file system configured on a machine >> without >>>> Hadoop distribution, you can manually add necessary Hadoop dependencies >> to >>>> Ignite node classpath: these are dependencies of groupId >>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently >> they >>>> are: >>>> >>>> 1. hadoop-annotations >>>> 2. hadoop-auth >>>> 3. hadoop-common >>>> 4. hadoop-hdfs >>>> 5. hadoop-mapreduce-client-common >>>> 6. hadoop-mapreduce-client-core >>>> >>>> ------------------------ >>>> >>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < >>>> [hidden email]> wrote: >>>> >>>>> Guys, >>>>> >>>>> Why don't we include ignite-hadoop module in Fabric? This user simply >>>> wants >>>>> to configure HDFS as a secondary file system to ensure persistence. Not >>>>> having the opportunity to do this in Fabric looks weird to me. And >>>> actually >>>>> I don't think this is a use case for Hadoop Accelerator. >>>>> >>>>> -Val >>>>> >>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> >>>> wrote: >>>>> >>>>>> Hi Ivan, >>>>>> >>>>>> 1) Yes, I think that it makes sense to have the old versions of the >>>> docs >>>>>> while an old version is still considered to be used by someone. >>>>>> >>>>>> 2) Absolutely, the time to add a corresponding article on the >>>> readme.io >>>>>> has come. It's not the first time I see the question related to HDFS >>>> as a >>>>>> secondary FS. >>>>>> Before and now it's not clear for me what exact steps I should follow >>>> to >>>>>> enable such a configuration. Our current suggestions look like a >>>> puzzle. >>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if >>>> you >>>>>> don't mind I would reaching you out directly asking for any technical >>>>>> assistance if needed. >>>>>> >>>>>> Regards, >>>>>> Denis >>>>>> >>>>>> >>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: >>>>>> >>>>>>> Hi, Valentin, >>>>>>> >>>>>>> 1) first of all note that the author of the question uses not the >>>> latest >>>>>>> doc page, namely >>>>>>> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system >>>> . >>>>>>> This is version 1.0, while the latest is 1.5: >>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it >>>>>>> appeared that some links from the latest doc version point to 1.0 doc >>>>>>> version. I fixed that in several places where I found that. Do we >>>> really >>>>>>> need old doc versions (1.0 -1.4)? >>>>>>> >>>>>>> 2) our documentation ( >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does >> not >>>>>>> provide any special setup instructions to configure HDFS as secondary >>>>> file >>>>>>> system in Ignite. Our docs assume that if a user wants to integrate >>>> with >>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). >>>> It >>>>>>> looks like the page >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should >> be >>>>>>> more >>>>>>> clear regarding the required configuration steps (in fact, setting up >>>>>>> HADOOP_HOME variable for Ignite node process). >>>>>>> >>>>>>> 3) Hadoop jars are correctly found by Ignite if the following >>>> conditions >>>>>>> are met: >>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" >>>> edition). >>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache >>>> Hadoop >>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the >>>>> Hadoop >>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) >>>>>>> >>>>>>> The exact mechanism of the Hadoop classpath composition can be found >>>> in >>>>>>> files >>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh >>>>>>> IGNITE_HOME/bin/include/setenv.sh . >>>>>>> >>>>>>> The issue is discussed in >>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 >>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . >>>>>>> >>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>> Igniters, >>>>>>>> >>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. >>>>>>>> >>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without >>>>>>>> Hadoop >>>>>>>> JARs, assuming that user will include them from the Hadoop >>>> distribution >>>>>>>> he >>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to >>>>> run >>>>>>>> mapreduce jobs, but I can't figure out steps required to configure >>>> HDFS >>>>>>>> as >>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? >> Is >>>>>>>> user >>>>>>>> supposed to add them manually? >>>>>>>> >>>>>>>> Can someone with more expertise in our Hadoop integration clarify >>>>> this? I >>>>>>>> believe there is not enough documentation on this topic. >>>>>>>> >>>>>>>> BTW, any ideas why user gets exception for JobConf class which is in >>>>>>>> 'mapred' package? Why map-reduce class is being used? >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>>>>>>> >>>>>>>> -Val >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >> >> |
Denis, good question.
Yes, there are several reasons. 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all others (e.g. BigTop) 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml), what prevents further cluster usage without Ignite. 3) setup-hadoop needs write permission to all the folders it writes files to. 4) It is possible to provide all the required functionality without any file modifications in the existing Hadoop cluster at all, see https://issues.apache.org/jira/browse/IGNITE-483. There were plans to remove "setup-hadoop", but that is not yet done. In any way, I 100% agree that presence of several different versions of the documentation is quite confusing and misleading. On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]> wrote: > Ivan, > > Is there any reason why we don’t recommend using > apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop > Accelerator articles? > > With setup-hadoop.sh I was able to build a valid classpath, create > symlinks to the accelerator's jars from hadoop’s libs folder automatically > and started an Ignite node that uses HDFS as a secondary FS in less than 10 > minutes. > > I just followed the instructions from > apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the > readme.io <http://readme.io/> look much more complex for me, they don’t > mention setup-hadoop.sh/bat at all making the end user to perform a > manual setup. > > — > Denis > > > On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]> > wrote: > > > > On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> > wrote: > > > >> Yes, this will be documented tomorrow. I want to go though all the steps > >> by myself checking all other possible obstacles the user may face with. > >> > > > > Thanks, Denis! > > > > > >> > >> — > >> Denis > >> > >>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email] > > > >> wrote: > >>> > >>> Ivan, I think this should be documented, no? > >>> > >>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> > >> wrote: > >>> > >>>> To enable just an IGFS persistence there is no need to use HDFS (this > >>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). > >>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , > >>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the > >>>> persistence upon local file system, and we already close to the > >> solution. > >>>> > >>>> Regarding the secondary Fs doc page ( > >>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would > >>>> suggest to add the following text there: > >>>> ------------------------ > >>>> If Ignite node with secondary file system configured on a machine with > >>>> Hadoop distribution, make sure Ignite is able to find appropriate > Hadoop > >>>> libraries: set HADOOP_HOME environment variable for the Ignite process > >> if > >>>> you're using Apache Hadoop distribution, or, if you use another > >>>> distribution (HDP, Cloudera, BigTop, etc.) make sure > /etc/default/hadoop > >>>> file exists and has appropriate contents. > >>>> > >>>> If Ignite node with secondary file system configured on a machine > >> without > >>>> Hadoop distribution, you can manually add necessary Hadoop > dependencies > >> to > >>>> Ignite node classpath: these are dependencies of groupId > >>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently > >> they > >>>> are: > >>>> > >>>> 1. hadoop-annotations > >>>> 2. hadoop-auth > >>>> 3. hadoop-common > >>>> 4. hadoop-hdfs > >>>> 5. hadoop-mapreduce-client-common > >>>> 6. hadoop-mapreduce-client-core > >>>> > >>>> ------------------------ > >>>> > >>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < > >>>> [hidden email]> wrote: > >>>> > >>>>> Guys, > >>>>> > >>>>> Why don't we include ignite-hadoop module in Fabric? This user simply > >>>> wants > >>>>> to configure HDFS as a secondary file system to ensure persistence. > Not > >>>>> having the opportunity to do this in Fabric looks weird to me. And > >>>> actually > >>>>> I don't think this is a use case for Hadoop Accelerator. > >>>>> > >>>>> -Val > >>>>> > >>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> > >>>> wrote: > >>>>> > >>>>>> Hi Ivan, > >>>>>> > >>>>>> 1) Yes, I think that it makes sense to have the old versions of the > >>>> docs > >>>>>> while an old version is still considered to be used by someone. > >>>>>> > >>>>>> 2) Absolutely, the time to add a corresponding article on the > >>>> readme.io > >>>>>> has come. It's not the first time I see the question related to HDFS > >>>> as a > >>>>>> secondary FS. > >>>>>> Before and now it's not clear for me what exact steps I should > follow > >>>> to > >>>>>> enable such a configuration. Our current suggestions look like a > >>>> puzzle. > >>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if > >>>> you > >>>>>> don't mind I would reaching you out directly asking for any > technical > >>>>>> assistance if needed. > >>>>>> > >>>>>> Regards, > >>>>>> Denis > >>>>>> > >>>>>> > >>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: > >>>>>> > >>>>>>> Hi, Valentin, > >>>>>>> > >>>>>>> 1) first of all note that the author of the question uses not the > >>>> latest > >>>>>>> doc page, namely > >>>>>>> > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system > >>>> . > >>>>>>> This is version 1.0, while the latest is 1.5: > >>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, > it > >>>>>>> appeared that some links from the latest doc version point to 1.0 > doc > >>>>>>> version. I fixed that in several places where I found that. Do we > >>>> really > >>>>>>> need old doc versions (1.0 -1.4)? > >>>>>>> > >>>>>>> 2) our documentation ( > >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does > >> not > >>>>>>> provide any special setup instructions to configure HDFS as > secondary > >>>>> file > >>>>>>> system in Ignite. Our docs assume that if a user wants to integrate > >>>> with > >>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. > >>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop > ). > >>>> It > >>>>>>> looks like the page > >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should > >> be > >>>>>>> more > >>>>>>> clear regarding the required configuration steps (in fact, setting > up > >>>>>>> HADOOP_HOME variable for Ignite node process). > >>>>>>> > >>>>>>> 3) Hadoop jars are correctly found by Ignite if the following > >>>> conditions > >>>>>>> are met: > >>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" > >>>> edition). > >>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache > >>>> Hadoop > >>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the > >>>>> Hadoop > >>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) > >>>>>>> > >>>>>>> The exact mechanism of the Hadoop classpath composition can be > found > >>>> in > >>>>>>> files > >>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh > >>>>>>> IGNITE_HOME/bin/include/setenv.sh . > >>>>>>> > >>>>>>> The issue is discussed in > >>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 > >>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . > >>>>>>> > >>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >>>>>>> [hidden email]> wrote: > >>>>>>> > >>>>>>> Igniters, > >>>>>>>> > >>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. > >>>>>>>> > >>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and > without > >>>>>>>> Hadoop > >>>>>>>> JARs, assuming that user will include them from the Hadoop > >>>> distribution > >>>>>>>> he > >>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop > to > >>>>> run > >>>>>>>> mapreduce jobs, but I can't figure out steps required to configure > >>>> HDFS > >>>>>>>> as > >>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? > >> Is > >>>>>>>> user > >>>>>>>> supposed to add them manually? > >>>>>>>> > >>>>>>>> Can someone with more expertise in our Hadoop integration clarify > >>>>> this? I > >>>>>>>> believe there is not enough documentation on this topic. > >>>>>>>> > >>>>>>>> BTW, any ideas why user gets exception for JobConf class which is > in > >>>>>>>> 'mapred' package? Why map-reduce class is being used? > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>> > >>>> > >> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>>>>>>> > >>>>>>>> -Val > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>> > >> > >> > > |
Hi Ivan,
Thanks for clarification. Actually I’ve modified the content of the following pages: - Added “Atomatic Hadoop Configuration” section that describes the usage of setup-hadoop with all its pros and cons for Apache Hadoop and CDH http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh - Provided more info on how to use ‘HDFS’ as a secondary file system for ‘IGFS’ using your yesterday answer and referring to the updated configuration guides http://apacheignite.gridgain.org/docs/secondary-file-system Please as an IGFS & Hadoop expert review my changes and edit them whenever required. In addition I noted that we have a disabled and empty article for BigTop distribution. Is this OK? — Denis > On 15 дек. 2015 г., at 12:10, Ivan V. <[hidden email]> wrote: > > Denis, good question. > Yes, there are several reasons. > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all > others (e.g. BigTop) > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml), > what prevents further cluster usage without Ignite. > 3) setup-hadoop needs write permission to all the folders it writes files > to. > 4) It is possible to provide all the required functionality without any > file modifications in the existing Hadoop cluster at all, see > https://issues.apache.org/jira/browse/IGNITE-483. > > There were plans to remove "setup-hadoop", but that is not yet done. > In any way, I 100% agree that presence of several different versions of the > documentation is quite confusing and misleading. > > > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]> wrote: > >> Ivan, >> >> Is there any reason why we don’t recommend using >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop >> Accelerator articles? >> >> With setup-hadoop.sh I was able to build a valid classpath, create >> symlinks to the accelerator's jars from hadoop’s libs folder automatically >> and started an Ignite node that uses HDFS as a secondary FS in less than 10 >> minutes. >> >> I just followed the instructions from >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the >> readme.io <http://readme.io/> look much more complex for me, they don’t >> mention setup-hadoop.sh/bat at all making the end user to perform a >> manual setup. >> >> — >> Denis >> >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]> >> wrote: >>> >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> >> wrote: >>> >>>> Yes, this will be documented tomorrow. I want to go though all the steps >>>> by myself checking all other possible obstacles the user may face with. >>>> >>> >>> Thanks, Denis! >>> >>> >>>> >>>> — >>>> Denis >>>> >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email] >>> >>>> wrote: >>>>> >>>>> Ivan, I think this should be documented, no? >>>>> >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> >>>> wrote: >>>>> >>>>>> To enable just an IGFS persistence there is no need to use HDFS (this >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the >>>>>> persistence upon local file system, and we already close to the >>>> solution. >>>>>> >>>>>> Regarding the secondary Fs doc page ( >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would >>>>>> suggest to add the following text there: >>>>>> ------------------------ >>>>>> If Ignite node with secondary file system configured on a machine with >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate >> Hadoop >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite process >>>> if >>>>>> you're using Apache Hadoop distribution, or, if you use another >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure >> /etc/default/hadoop >>>>>> file exists and has appropriate contents. >>>>>> >>>>>> If Ignite node with secondary file system configured on a machine >>>> without >>>>>> Hadoop distribution, you can manually add necessary Hadoop >> dependencies >>>> to >>>>>> Ignite node classpath: these are dependencies of groupId >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently >>>> they >>>>>> are: >>>>>> >>>>>> 1. hadoop-annotations >>>>>> 2. hadoop-auth >>>>>> 3. hadoop-common >>>>>> 4. hadoop-hdfs >>>>>> 5. hadoop-mapreduce-client-common >>>>>> 6. hadoop-mapreduce-client-core >>>>>> >>>>>> ------------------------ >>>>>> >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < >>>>>> [hidden email]> wrote: >>>>>> >>>>>>> Guys, >>>>>>> >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply >>>>>> wants >>>>>>> to configure HDFS as a secondary file system to ensure persistence. >> Not >>>>>>> having the opportunity to do this in Fabric looks weird to me. And >>>>>> actually >>>>>>> I don't think this is a use case for Hadoop Accelerator. >>>>>>> >>>>>>> -Val >>>>>>> >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> >>>>>> wrote: >>>>>>> >>>>>>>> Hi Ivan, >>>>>>>> >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of the >>>>>> docs >>>>>>>> while an old version is still considered to be used by someone. >>>>>>>> >>>>>>>> 2) Absolutely, the time to add a corresponding article on the >>>>>> readme.io >>>>>>>> has come. It's not the first time I see the question related to HDFS >>>>>> as a >>>>>>>> secondary FS. >>>>>>>> Before and now it's not clear for me what exact steps I should >> follow >>>>>> to >>>>>>>> enable such a configuration. Our current suggestions look like a >>>>>> puzzle. >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if >>>>>> you >>>>>>>> don't mind I would reaching you out directly asking for any >> technical >>>>>>>> assistance if needed. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Denis >>>>>>>> >>>>>>>> >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: >>>>>>>> >>>>>>>>> Hi, Valentin, >>>>>>>>> >>>>>>>>> 1) first of all note that the author of the question uses not the >>>>>> latest >>>>>>>>> doc page, namely >>>>>>>>> >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system >>>>>> . >>>>>>>>> This is version 1.0, while the latest is 1.5: >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, >> it >>>>>>>>> appeared that some links from the latest doc version point to 1.0 >> doc >>>>>>>>> version. I fixed that in several places where I found that. Do we >>>>>> really >>>>>>>>> need old doc versions (1.0 -1.4)? >>>>>>>>> >>>>>>>>> 2) our documentation ( >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does >>>> not >>>>>>>>> provide any special setup instructions to configure HDFS as >> secondary >>>>>>> file >>>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate >>>>>> with >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >>>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop >> ). >>>>>> It >>>>>>>>> looks like the page >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should >>>> be >>>>>>>>> more >>>>>>>>> clear regarding the required configuration steps (in fact, setting >> up >>>>>>>>> HADOOP_HOME variable for Ignite node process). >>>>>>>>> >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following >>>>>> conditions >>>>>>>>> are met: >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" >>>>>> edition). >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache >>>>>> Hadoop >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the >>>>>>> Hadoop >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) >>>>>>>>> >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be >> found >>>>>> in >>>>>>>>> files >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh . >>>>>>>>> >>>>>>>>> The issue is discussed in >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . >>>>>>>>> >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >>>>>>>>> [hidden email]> wrote: >>>>>>>>> >>>>>>>>> Igniters, >>>>>>>>>> >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. >>>>>>>>>> >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and >> without >>>>>>>>>> Hadoop >>>>>>>>>> JARs, assuming that user will include them from the Hadoop >>>>>> distribution >>>>>>>>>> he >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop >> to >>>>>>> run >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure >>>>>> HDFS >>>>>>>>>> as >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? >>>> Is >>>>>>>>>> user >>>>>>>>>> supposed to add them manually? >>>>>>>>>> >>>>>>>>>> Can someone with more expertise in our Hadoop integration clarify >>>>>>> this? I >>>>>>>>>> believe there is not enough documentation on this topic. >>>>>>>>>> >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is >> in >>>>>>>>>> 'mapred' package? Why map-reduce class is being used? >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>>>>>>>>> >>>>>>>>>> -Val >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >> |
Hi, Denis,
1) my opinion is that we'd better not mention 'setup-hadoop' script at all (for the reasons mentioned above) and delete it in the nearest release. 2) Now Ignite is a part of BigTop distribution (see https://issues.apache.org/jira/browse/IGNITE-665), so the old BigTop instruction is not relevant any more. I guess, this is the reason. On Tue, Dec 15, 2015 at 12:35 PM, Denis Magda <[hidden email]> wrote: > Hi Ivan, > > Thanks for clarification. > > Actually I’ve modified the content of the following pages: > > - Added “Atomatic Hadoop Configuration” section that describes the usage > of setup-hadoop with all its pros and cons for Apache Hadoop and CDH > > http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration > http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh > > - Provided more info on how to use ‘HDFS’ as a secondary file system for > ‘IGFS’ using your yesterday answer and referring to the updated > configuration guides > http://apacheignite.gridgain.org/docs/secondary-file-system > > Please as an IGFS & Hadoop expert review my changes and edit them whenever > required. > > In addition I noted that we have a disabled and empty article for BigTop > distribution. Is this OK? > > — > Denis > > > On 15 дек. 2015 г., at 12:10, Ivan V. <[hidden email]> wrote: > > > > Denis, good question. > > Yes, there are several reasons. > > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for > all > > others (e.g. BigTop) > > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml), > > what prevents further cluster usage without Ignite. > > 3) setup-hadoop needs write permission to all the folders it writes files > > to. > > 4) It is possible to provide all the required functionality without any > > file modifications in the existing Hadoop cluster at all, see > > https://issues.apache.org/jira/browse/IGNITE-483. > > > > There were plans to remove "setup-hadoop", but that is not yet done. > > In any way, I 100% agree that presence of several different versions of > the > > documentation is quite confusing and misleading. > > > > > > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]> > wrote: > > > >> Ivan, > >> > >> Is there any reason why we don’t recommend using > >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop > >> Accelerator articles? > >> > >> With setup-hadoop.sh I was able to build a valid classpath, create > >> symlinks to the accelerator's jars from hadoop’s libs folder > automatically > >> and started an Ignite node that uses HDFS as a secondary FS in less > than 10 > >> minutes. > >> > >> I just followed the instructions from > >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the > >> readme.io <http://readme.io/> look much more complex for me, they don’t > >> mention setup-hadoop.sh/bat at all making the end user to perform a > >> manual setup. > >> > >> — > >> Denis > >> > >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email] > > > >> wrote: > >>> > >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> > >> wrote: > >>> > >>>> Yes, this will be documented tomorrow. I want to go though all the > steps > >>>> by myself checking all other possible obstacles the user may face > with. > >>>> > >>> > >>> Thanks, Denis! > >>> > >>> > >>>> > >>>> — > >>>> Denis > >>>> > >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan < > [hidden email] > >>> > >>>> wrote: > >>>>> > >>>>> Ivan, I think this should be documented, no? > >>>>> > >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> > >>>> wrote: > >>>>> > >>>>>> To enable just an IGFS persistence there is no need to use HDFS > (this > >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). > >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 > , > >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the > >>>>>> persistence upon local file system, and we already close to the > >>>> solution. > >>>>>> > >>>>>> Regarding the secondary Fs doc page ( > >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I > would > >>>>>> suggest to add the following text there: > >>>>>> ------------------------ > >>>>>> If Ignite node with secondary file system configured on a machine > with > >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate > >> Hadoop > >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite > process > >>>> if > >>>>>> you're using Apache Hadoop distribution, or, if you use another > >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure > >> /etc/default/hadoop > >>>>>> file exists and has appropriate contents. > >>>>>> > >>>>>> If Ignite node with secondary file system configured on a machine > >>>> without > >>>>>> Hadoop distribution, you can manually add necessary Hadoop > >> dependencies > >>>> to > >>>>>> Ignite node classpath: these are dependencies of groupId > >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . > Currently > >>>> they > >>>>>> are: > >>>>>> > >>>>>> 1. hadoop-annotations > >>>>>> 2. hadoop-auth > >>>>>> 3. hadoop-common > >>>>>> 4. hadoop-hdfs > >>>>>> 5. hadoop-mapreduce-client-common > >>>>>> 6. hadoop-mapreduce-client-core > >>>>>> > >>>>>> ------------------------ > >>>>>> > >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < > >>>>>> [hidden email]> wrote: > >>>>>> > >>>>>>> Guys, > >>>>>>> > >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user > simply > >>>>>> wants > >>>>>>> to configure HDFS as a secondary file system to ensure persistence. > >> Not > >>>>>>> having the opportunity to do this in Fabric looks weird to me. And > >>>>>> actually > >>>>>>> I don't think this is a use case for Hadoop Accelerator. > >>>>>>> > >>>>>>> -Val > >>>>>>> > >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email] > > > >>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Ivan, > >>>>>>>> > >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of > the > >>>>>> docs > >>>>>>>> while an old version is still considered to be used by someone. > >>>>>>>> > >>>>>>>> 2) Absolutely, the time to add a corresponding article on the > >>>>>> readme.io > >>>>>>>> has come. It's not the first time I see the question related to > HDFS > >>>>>> as a > >>>>>>>> secondary FS. > >>>>>>>> Before and now it's not clear for me what exact steps I should > >> follow > >>>>>> to > >>>>>>>> enable such a configuration. Our current suggestions look like a > >>>>>> puzzle. > >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan > if > >>>>>> you > >>>>>>>> don't mind I would reaching you out directly asking for any > >> technical > >>>>>>>> assistance if needed. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Denis > >>>>>>>> > >>>>>>>> > >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: > >>>>>>>> > >>>>>>>>> Hi, Valentin, > >>>>>>>>> > >>>>>>>>> 1) first of all note that the author of the question uses not the > >>>>>> latest > >>>>>>>>> doc page, namely > >>>>>>>>> > >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system > >>>>>> . > >>>>>>>>> This is version 1.0, while the latest is 1.5: > >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, > >> it > >>>>>>>>> appeared that some links from the latest doc version point to 1.0 > >> doc > >>>>>>>>> version. I fixed that in several places where I found that. Do we > >>>>>> really > >>>>>>>>> need old doc versions (1.0 -1.4)? > >>>>>>>>> > >>>>>>>>> 2) our documentation ( > >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) > does > >>>> not > >>>>>>>>> provide any special setup instructions to configure HDFS as > >> secondary > >>>>>>> file > >>>>>>>>> system in Ignite. Our docs assume that if a user wants to > integrate > >>>>>> with > >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction > (e.g. > >>>>>>>>> > http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop > >> ). > >>>>>> It > >>>>>>>>> looks like the page > >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system > should > >>>> be > >>>>>>>>> more > >>>>>>>>> clear regarding the required configuration steps (in fact, > setting > >> up > >>>>>>>>> HADOOP_HOME variable for Ignite node process). > >>>>>>>>> > >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following > >>>>>> conditions > >>>>>>>>> are met: > >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" > >>>>>> edition). > >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache > >>>>>> Hadoop > >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches > the > >>>>>>> Hadoop > >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) > >>>>>>>>> > >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be > >> found > >>>>>> in > >>>>>>>>> files > >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh > >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh . > >>>>>>>>> > >>>>>>>>> The issue is discussed in > >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 > >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . > >>>>>>>>> > >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >>>>>>>>> [hidden email]> wrote: > >>>>>>>>> > >>>>>>>>> Igniters, > >>>>>>>>>> > >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. > >>>>>>>>>> > >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and > >> without > >>>>>>>>>> Hadoop > >>>>>>>>>> JARs, assuming that user will include them from the Hadoop > >>>>>> distribution > >>>>>>>>>> he > >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to > Hadoop > >> to > >>>>>>> run > >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to > configure > >>>>>> HDFS > >>>>>>>>>> as > >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on > classpath? > >>>> Is > >>>>>>>>>> user > >>>>>>>>>> supposed to add them manually? > >>>>>>>>>> > >>>>>>>>>> Can someone with more expertise in our Hadoop integration > clarify > >>>>>>> this? I > >>>>>>>>>> believe there is not enough documentation on this topic. > >>>>>>>>>> > >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which > is > >> in > >>>>>>>>>> 'mapred' package? Why map-reduce class is being used? > >>>>>>>>>> > >>>>>>>>>> [1] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>>>>>>>>> > >>>>>>>>>> -Val > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > |
In reply to this post by Valentin Kulichenko
The integration with external systems like HDFS is a complex topics and should
be generally solved at the level of the software that has no control over a user's environment (yes, I am talking about Igite). In Bigtop we are doing a lot of this stuff, including the guarantees that version of HDFS, Ignite has been built against, will be in the cluster, etc. Generally speaking, if someone rejects to use orchestration and deployment software similar to Bigtop, finding the correct libs is their own responsibility. I would advise not to load extra modules nor to redistribute libs from another project, just to solve someone's inability to correctly configure their own cluster. Cos On Fri, Dec 11, 2015 at 04:45PM, Valentin Kulichenko wrote: > Igniters, > > I'm looking at the question on SO [1] and I'm a bit confused. > > We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop > JARs, assuming that user will include them from the Hadoop distribution he > uses. It seems OK for me when accelerator is plugged in to Hadoop to run > mapreduce jobs, but I can't figure out steps required to configure HDFS as > a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user > supposed to add them manually? > > Can someone with more expertise in our Hadoop integration clarify this? I > believe there is not enough documentation on this topic. > > BTW, any ideas why user gets exception for JobConf class which is in > 'mapred' package? Why map-reduce class is being used? > > [1] > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > > -Val |
Free forum by Nabble | Edit this page |