Using HDFS as a secondary FS

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Using HDFS as a secondary FS

Valentin Kulichenko
Igniters,

I'm looking at the question on SO [1] and I'm a bit confused.

We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop
JARs, assuming that user will include them from the Hadoop distribution he
uses. It seems OK for me when accelerator is plugged in to Hadoop to run
mapreduce jobs, but I can't figure out steps required to configure HDFS as
a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user
supposed to add them manually?

Can someone with more expertise in our Hadoop integration clarify this? I
believe there is not enough documentation on this topic.

BTW, any ideas why user gets exception for JobConf class which is in
'mapred' package? Why map-reduce class is being used?

[1]
http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem

-Val
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Ivan V.
Hi, Valentin,

1) first of all note that the author of the question uses not the latest
doc page, namely
http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
This is version 1.0, while the latest is 1.5:
https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
appeared that some links from the latest doc version point to 1.0 doc
version. I fixed that in several places where I found that. Do we really
need old doc versions (1.0 -1.4)?

2) our documentation (
http://apacheignite.gridgain.org/docs/secondary-file-system) does not
provide any special setup instructions to configure HDFS as secondary file
system in Ignite. Our docs assume that if a user wants to integrate with
Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
looks like the page
http://apacheignite.gridgain.org/docs/secondary-file-system should be more
clear regarding the required configuration steps (in fact, setting up
HADOOP_HOME variable for Ignite node process).

3) Hadoop jars are correctly found by Ignite if the following conditions
are met:
(a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
(b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
distribution), or file "/etc/default/hadoop" exists and matches the Hadoop
distribution used (BigTop, Cloudera, HDP, etc.)

The exact mechanism of the Hadoop classpath composition can be found in
files
IGNITE_HOME/bin/include/hadoop-classpath.sh
IGNITE_HOME/bin/include/setenv.sh .

The issue is discussed in https://issues.apache.org/jira/browse/IGNITE-372
, https://issues.apache.org/jira/browse/IGNITE-483 .

On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
[hidden email]> wrote:

> Igniters,
>
> I'm looking at the question on SO [1] and I'm a bit confused.
>
> We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop
> JARs, assuming that user will include them from the Hadoop distribution he
> uses. It seems OK for me when accelerator is plugged in to Hadoop to run
> mapreduce jobs, but I can't figure out steps required to configure HDFS as
> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user
> supposed to add them manually?
>
> Can someone with more expertise in our Hadoop integration clarify this? I
> believe there is not enough documentation on this topic.
>
> BTW, any ideas why user gets exception for JobConf class which is in
> 'mapred' package? Why map-reduce class is being used?
>
> [1]
>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>
> -Val
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Denis Magda
Hi Ivan,

1) Yes, I think that it makes sense to have the old versions of the docs
while an old version is still considered to be used by someone.

2) Absolutely, the time to add a corresponding article on the readme.io
has come. It's not the first time I see the question related to HDFS as
a secondary FS.
Before and now it's not clear for me what exact steps I should follow to
enable such a configuration. Our current suggestions look like a puzzle.
I'll assemble the puzzle on my side and prepare the article. Ivan if you
don't mind I would reaching you out directly asking for any technical
assistance if needed.

Regards,
Denis

On 12/14/2015 10:25 AM, Ivan V. wrote:

> Hi, Valentin,
>
> 1) first of all note that the author of the question uses not the latest
> doc page, namely
> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
> This is version 1.0, while the latest is 1.5:
> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> appeared that some links from the latest doc version point to 1.0 doc
> version. I fixed that in several places where I found that. Do we really
> need old doc versions (1.0 -1.4)?
>
> 2) our documentation (
> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
> provide any special setup instructions to configure HDFS as secondary file
> system in Ignite. Our docs assume that if a user wants to integrate with
> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
> looks like the page
> http://apacheignite.gridgain.org/docs/secondary-file-system should be more
> clear regarding the required configuration steps (in fact, setting up
> HADOOP_HOME variable for Ignite node process).
>
> 3) Hadoop jars are correctly found by Ignite if the following conditions
> are met:
> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
> distribution), or file "/etc/default/hadoop" exists and matches the Hadoop
> distribution used (BigTop, Cloudera, HDP, etc.)
>
> The exact mechanism of the Hadoop classpath composition can be found in
> files
> IGNITE_HOME/bin/include/hadoop-classpath.sh
> IGNITE_HOME/bin/include/setenv.sh .
>
> The issue is discussed in https://issues.apache.org/jira/browse/IGNITE-372
> , https://issues.apache.org/jira/browse/IGNITE-483 .
>
> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> [hidden email]> wrote:
>
>> Igniters,
>>
>> I'm looking at the question on SO [1] and I'm a bit confused.
>>
>> We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop
>> JARs, assuming that user will include them from the Hadoop distribution he
>> uses. It seems OK for me when accelerator is plugged in to Hadoop to run
>> mapreduce jobs, but I can't figure out steps required to configure HDFS as
>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user
>> supposed to add them manually?
>>
>> Can someone with more expertise in our Hadoop integration clarify this? I
>> believe there is not enough documentation on this topic.
>>
>> BTW, any ideas why user gets exception for JobConf class which is in
>> 'mapred' package? Why map-reduce class is being used?
>>
>> [1]
>>
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>
>> -Val
>>

Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Valentin Kulichenko
Guys,

Why don't we include ignite-hadoop module in Fabric? This user simply wants
to configure HDFS as a secondary file system to ensure persistence. Not
having the opportunity to do this in Fabric looks weird to me. And actually
I don't think this is a use case for Hadoop Accelerator.

-Val

On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote:

> Hi Ivan,
>
> 1) Yes, I think that it makes sense to have the old versions of the docs
> while an old version is still considered to be used by someone.
>
> 2) Absolutely, the time to add a corresponding article on the readme.io
> has come. It's not the first time I see the question related to HDFS as a
> secondary FS.
> Before and now it's not clear for me what exact steps I should follow to
> enable such a configuration. Our current suggestions look like a puzzle.
> I'll assemble the puzzle on my side and prepare the article. Ivan if you
> don't mind I would reaching you out directly asking for any technical
> assistance if needed.
>
> Regards,
> Denis
>
>
> On 12/14/2015 10:25 AM, Ivan V. wrote:
>
>> Hi, Valentin,
>>
>> 1) first of all note that the author of the question uses not the latest
>> doc page, namely
>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
>> This is version 1.0, while the latest is 1.5:
>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
>> appeared that some links from the latest doc version point to 1.0 doc
>> version. I fixed that in several places where I found that. Do we really
>> need old doc versions (1.0 -1.4)?
>>
>> 2) our documentation (
>> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
>> provide any special setup instructions to configure HDFS as secondary file
>> system in Ignite. Our docs assume that if a user wants to integrate with
>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
>> looks like the page
>> http://apacheignite.gridgain.org/docs/secondary-file-system should be
>> more
>> clear regarding the required configuration steps (in fact, setting up
>> HADOOP_HOME variable for Ignite node process).
>>
>> 3) Hadoop jars are correctly found by Ignite if the following conditions
>> are met:
>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
>> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
>> distribution), or file "/etc/default/hadoop" exists and matches the Hadoop
>> distribution used (BigTop, Cloudera, HDP, etc.)
>>
>> The exact mechanism of the Hadoop classpath composition can be found in
>> files
>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>> IGNITE_HOME/bin/include/setenv.sh .
>>
>> The issue is discussed in
>> https://issues.apache.org/jira/browse/IGNITE-372
>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>
>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>> [hidden email]> wrote:
>>
>> Igniters,
>>>
>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>
>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
>>> Hadoop
>>> JARs, assuming that user will include them from the Hadoop distribution
>>> he
>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to run
>>> mapreduce jobs, but I can't figure out steps required to configure HDFS
>>> as
>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
>>> user
>>> supposed to add them manually?
>>>
>>> Can someone with more expertise in our Hadoop integration clarify this? I
>>> believe there is not enough documentation on this topic.
>>>
>>> BTW, any ideas why user gets exception for JobConf class which is in
>>> 'mapred' package? Why map-reduce class is being used?
>>>
>>> [1]
>>>
>>>
>>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>
>>> -Val
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Vladimir Ozerov
Valya,

Because we decide whether to load Hadoop module based on its availability
in classpath. And when Hadoop module is loaded, certain restrictions are
applied to configuration, e.g. peerClassLoadingEnabled must be false.
All this looks very inconvenient for me, but this is how things currently
work.

Vladimir.

On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
[hidden email]> wrote:

> Guys,
>
> Why don't we include ignite-hadoop module in Fabric? This user simply wants
> to configure HDFS as a secondary file system to ensure persistence. Not
> having the opportunity to do this in Fabric looks weird to me. And actually
> I don't think this is a use case for Hadoop Accelerator.
>
> -Val
>
> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote:
>
> > Hi Ivan,
> >
> > 1) Yes, I think that it makes sense to have the old versions of the docs
> > while an old version is still considered to be used by someone.
> >
> > 2) Absolutely, the time to add a corresponding article on the readme.io
> > has come. It's not the first time I see the question related to HDFS as a
> > secondary FS.
> > Before and now it's not clear for me what exact steps I should follow to
> > enable such a configuration. Our current suggestions look like a puzzle.
> > I'll assemble the puzzle on my side and prepare the article. Ivan if you
> > don't mind I would reaching you out directly asking for any technical
> > assistance if needed.
> >
> > Regards,
> > Denis
> >
> >
> > On 12/14/2015 10:25 AM, Ivan V. wrote:
> >
> >> Hi, Valentin,
> >>
> >> 1) first of all note that the author of the question uses not the latest
> >> doc page, namely
> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
> >> This is version 1.0, while the latest is 1.5:
> >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> >> appeared that some links from the latest doc version point to 1.0 doc
> >> version. I fixed that in several places where I found that. Do we really
> >> need old doc versions (1.0 -1.4)?
> >>
> >> 2) our documentation (
> >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
> >> provide any special setup instructions to configure HDFS as secondary
> file
> >> system in Ignite. Our docs assume that if a user wants to integrate with
> >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
> >> looks like the page
> >> http://apacheignite.gridgain.org/docs/secondary-file-system should be
> >> more
> >> clear regarding the required configuration steps (in fact, setting up
> >> HADOOP_HOME variable for Ignite node process).
> >>
> >> 3) Hadoop jars are correctly found by Ignite if the following conditions
> >> are met:
> >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
> >> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
> >> distribution), or file "/etc/default/hadoop" exists and matches the
> Hadoop
> >> distribution used (BigTop, Cloudera, HDP, etc.)
> >>
> >> The exact mechanism of the Hadoop classpath composition can be found in
> >> files
> >> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >> IGNITE_HOME/bin/include/setenv.sh .
> >>
> >> The issue is discussed in
> >> https://issues.apache.org/jira/browse/IGNITE-372
> >> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>
> >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >> [hidden email]> wrote:
> >>
> >> Igniters,
> >>>
> >>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>
> >>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> >>> Hadoop
> >>> JARs, assuming that user will include them from the Hadoop distribution
> >>> he
> >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> run
> >>> mapreduce jobs, but I can't figure out steps required to configure HDFS
> >>> as
> >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
> >>> user
> >>> supposed to add them manually?
> >>>
> >>> Can someone with more expertise in our Hadoop integration clarify
> this? I
> >>> believe there is not enough documentation on this topic.
> >>>
> >>> BTW, any ideas why user gets exception for JobConf class which is in
> >>> 'mapred' package? Why map-reduce class is being used?
> >>>
> >>> [1]
> >>>
> >>>
> >>>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>
> >>> -Val
> >>>
> >>>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Ivan V.
In reply to this post by Valentin Kulichenko
To enable just an IGFS persistence there is no need to use HDFS (this
requires Hadoop dependency, requires configured HDFS cluster, etc.).
We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
persistence upon local file system, and we already close to  the solution.

Regarding the secondary Fs doc page (
http://apacheignite.gridgain.org/docs/secondary-file-system) I would
suggest to add the following text there:
------------------------
If Ignite node with secondary file system configured on a machine with
Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
libraries: set HADOOP_HOME environment variable for the Ignite process if
you're using Apache Hadoop distribution, or, if you use another
distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
file exists and has appropriate contents.

If Ignite node with secondary file system configured on a machine without
Hadoop distribution, you can manually add necessary Hadoop dependencies to
Ignite node classpath: these are dependencies of groupId
"org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they
are:

   1. hadoop-annotations
   2. hadoop-auth
   3. hadoop-common
   4. hadoop-hdfs
   5. hadoop-mapreduce-client-common
   6. hadoop-mapreduce-client-core

------------------------

On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
[hidden email]> wrote:

> Guys,
>
> Why don't we include ignite-hadoop module in Fabric? This user simply wants
> to configure HDFS as a secondary file system to ensure persistence. Not
> having the opportunity to do this in Fabric looks weird to me. And actually
> I don't think this is a use case for Hadoop Accelerator.
>
> -Val
>
> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]> wrote:
>
> > Hi Ivan,
> >
> > 1) Yes, I think that it makes sense to have the old versions of the docs
> > while an old version is still considered to be used by someone.
> >
> > 2) Absolutely, the time to add a corresponding article on the readme.io
> > has come. It's not the first time I see the question related to HDFS as a
> > secondary FS.
> > Before and now it's not clear for me what exact steps I should follow to
> > enable such a configuration. Our current suggestions look like a puzzle.
> > I'll assemble the puzzle on my side and prepare the article. Ivan if you
> > don't mind I would reaching you out directly asking for any technical
> > assistance if needed.
> >
> > Regards,
> > Denis
> >
> >
> > On 12/14/2015 10:25 AM, Ivan V. wrote:
> >
> >> Hi, Valentin,
> >>
> >> 1) first of all note that the author of the question uses not the latest
> >> doc page, namely
> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
> >> This is version 1.0, while the latest is 1.5:
> >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> >> appeared that some links from the latest doc version point to 1.0 doc
> >> version. I fixed that in several places where I found that. Do we really
> >> need old doc versions (1.0 -1.4)?
> >>
> >> 2) our documentation (
> >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
> >> provide any special setup instructions to configure HDFS as secondary
> file
> >> system in Ignite. Our docs assume that if a user wants to integrate with
> >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
> >> looks like the page
> >> http://apacheignite.gridgain.org/docs/secondary-file-system should be
> >> more
> >> clear regarding the required configuration steps (in fact, setting up
> >> HADOOP_HOME variable for Ignite node process).
> >>
> >> 3) Hadoop jars are correctly found by Ignite if the following conditions
> >> are met:
> >> (a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
> >> (b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
> >> distribution), or file "/etc/default/hadoop" exists and matches the
> Hadoop
> >> distribution used (BigTop, Cloudera, HDP, etc.)
> >>
> >> The exact mechanism of the Hadoop classpath composition can be found in
> >> files
> >> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >> IGNITE_HOME/bin/include/setenv.sh .
> >>
> >> The issue is discussed in
> >> https://issues.apache.org/jira/browse/IGNITE-372
> >> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>
> >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >> [hidden email]> wrote:
> >>
> >> Igniters,
> >>>
> >>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>
> >>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> >>> Hadoop
> >>> JARs, assuming that user will include them from the Hadoop distribution
> >>> he
> >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> run
> >>> mapreduce jobs, but I can't figure out steps required to configure HDFS
> >>> as
> >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
> >>> user
> >>> supposed to add them manually?
> >>>
> >>> Can someone with more expertise in our Hadoop integration clarify
> this? I
> >>> believe there is not enough documentation on this topic.
> >>>
> >>> BTW, any ideas why user gets exception for JobConf class which is in
> >>> 'mapred' package? Why map-reduce class is being used?
> >>>
> >>> [1]
> >>>
> >>>
> >>>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>
> >>> -Val
> >>>
> >>>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

dsetrakyan
Ivan, I think this should be documented, no?

On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> wrote:

> To enable just an IGFS persistence there is no need to use HDFS (this
> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> persistence upon local file system, and we already close to  the solution.
>
> Regarding the secondary Fs doc page (
> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> suggest to add the following text there:
> ------------------------
> If Ignite node with secondary file system configured on a machine with
> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
> libraries: set HADOOP_HOME environment variable for the Ignite process if
> you're using Apache Hadoop distribution, or, if you use another
> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
> file exists and has appropriate contents.
>
> If Ignite node with secondary file system configured on a machine without
> Hadoop distribution, you can manually add necessary Hadoop dependencies to
> Ignite node classpath: these are dependencies of groupId
> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they
> are:
>
>    1. hadoop-annotations
>    2. hadoop-auth
>    3. hadoop-common
>    4. hadoop-hdfs
>    5. hadoop-mapreduce-client-common
>    6. hadoop-mapreduce-client-core
>
> ------------------------
>
> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> [hidden email]> wrote:
>
> > Guys,
> >
> > Why don't we include ignite-hadoop module in Fabric? This user simply
> wants
> > to configure HDFS as a secondary file system to ensure persistence. Not
> > having the opportunity to do this in Fabric looks weird to me. And
> actually
> > I don't think this is a use case for Hadoop Accelerator.
> >
> > -Val
> >
> > On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
> wrote:
> >
> > > Hi Ivan,
> > >
> > > 1) Yes, I think that it makes sense to have the old versions of the
> docs
> > > while an old version is still considered to be used by someone.
> > >
> > > 2) Absolutely, the time to add a corresponding article on the
> readme.io
> > > has come. It's not the first time I see the question related to HDFS
> as a
> > > secondary FS.
> > > Before and now it's not clear for me what exact steps I should follow
> to
> > > enable such a configuration. Our current suggestions look like a
> puzzle.
> > > I'll assemble the puzzle on my side and prepare the article. Ivan if
> you
> > > don't mind I would reaching you out directly asking for any technical
> > > assistance if needed.
> > >
> > > Regards,
> > > Denis
> > >
> > >
> > > On 12/14/2015 10:25 AM, Ivan V. wrote:
> > >
> > >> Hi, Valentin,
> > >>
> > >> 1) first of all note that the author of the question uses not the
> latest
> > >> doc page, namely
> > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> .
> > >> This is version 1.0, while the latest is 1.5:
> > >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> > >> appeared that some links from the latest doc version point to 1.0 doc
> > >> version. I fixed that in several places where I found that. Do we
> really
> > >> need old doc versions (1.0 -1.4)?
> > >>
> > >> 2) our documentation (
> > >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
> > >> provide any special setup instructions to configure HDFS as secondary
> > file
> > >> system in Ignite. Our docs assume that if a user wants to integrate
> with
> > >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> > >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
> It
> > >> looks like the page
> > >> http://apacheignite.gridgain.org/docs/secondary-file-system should be
> > >> more
> > >> clear regarding the required configuration steps (in fact, setting up
> > >> HADOOP_HOME variable for Ignite node process).
> > >>
> > >> 3) Hadoop jars are correctly found by Ignite if the following
> conditions
> > >> are met:
> > >> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> edition).
> > >> (b) Either HADOOP_HOME environment variable is set up (for Apache
> Hadoop
> > >> distribution), or file "/etc/default/hadoop" exists and matches the
> > Hadoop
> > >> distribution used (BigTop, Cloudera, HDP, etc.)
> > >>
> > >> The exact mechanism of the Hadoop classpath composition can be found
> in
> > >> files
> > >> IGNITE_HOME/bin/include/hadoop-classpath.sh
> > >> IGNITE_HOME/bin/include/setenv.sh .
> > >>
> > >> The issue is discussed in
> > >> https://issues.apache.org/jira/browse/IGNITE-372
> > >> , https://issues.apache.org/jira/browse/IGNITE-483 .
> > >>
> > >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> > >> [hidden email]> wrote:
> > >>
> > >> Igniters,
> > >>>
> > >>> I'm looking at the question on SO [1] and I'm a bit confused.
> > >>>
> > >>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> > >>> Hadoop
> > >>> JARs, assuming that user will include them from the Hadoop
> distribution
> > >>> he
> > >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> > run
> > >>> mapreduce jobs, but I can't figure out steps required to configure
> HDFS
> > >>> as
> > >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
> > >>> user
> > >>> supposed to add them manually?
> > >>>
> > >>> Can someone with more expertise in our Hadoop integration clarify
> > this? I
> > >>> believe there is not enough documentation on this topic.
> > >>>
> > >>> BTW, any ideas why user gets exception for JobConf class which is in
> > >>> 'mapred' package? Why map-reduce class is being used?
> > >>>
> > >>> [1]
> > >>>
> > >>>
> > >>>
> >
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> > >>>
> > >>> -Val
> > >>>
> > >>>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Denis Magda
Yes, this will be documented tomorrow. I want to go though all the steps by myself checking all other possible obstacles the user may face with.


Denis

> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]> wrote:
>
> Ivan, I think this should be documented, no?
>
> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]> wrote:
>
>> To enable just an IGFS persistence there is no need to use HDFS (this
>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>> persistence upon local file system, and we already close to  the solution.
>>
>> Regarding the secondary Fs doc page (
>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>> suggest to add the following text there:
>> ------------------------
>> If Ignite node with secondary file system configured on a machine with
>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
>> libraries: set HADOOP_HOME environment variable for the Ignite process if
>> you're using Apache Hadoop distribution, or, if you use another
>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
>> file exists and has appropriate contents.
>>
>> If Ignite node with secondary file system configured on a machine without
>> Hadoop distribution, you can manually add necessary Hadoop dependencies to
>> Ignite node classpath: these are dependencies of groupId
>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they
>> are:
>>
>>   1. hadoop-annotations
>>   2. hadoop-auth
>>   3. hadoop-common
>>   4. hadoop-hdfs
>>   5. hadoop-mapreduce-client-common
>>   6. hadoop-mapreduce-client-core
>>
>> ------------------------
>>
>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>> [hidden email]> wrote:
>>
>>> Guys,
>>>
>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>> wants
>>> to configure HDFS as a secondary file system to ensure persistence. Not
>>> having the opportunity to do this in Fabric looks weird to me. And
>> actually
>>> I don't think this is a use case for Hadoop Accelerator.
>>>
>>> -Val
>>>
>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
>> wrote:
>>>
>>>> Hi Ivan,
>>>>
>>>> 1) Yes, I think that it makes sense to have the old versions of the
>> docs
>>>> while an old version is still considered to be used by someone.
>>>>
>>>> 2) Absolutely, the time to add a corresponding article on the
>> readme.io
>>>> has come. It's not the first time I see the question related to HDFS
>> as a
>>>> secondary FS.
>>>> Before and now it's not clear for me what exact steps I should follow
>> to
>>>> enable such a configuration. Our current suggestions look like a
>> puzzle.
>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>> you
>>>> don't mind I would reaching you out directly asking for any technical
>>>> assistance if needed.
>>>>
>>>> Regards,
>>>> Denis
>>>>
>>>>
>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>>
>>>>> Hi, Valentin,
>>>>>
>>>>> 1) first of all note that the author of the question uses not the
>> latest
>>>>> doc page, namely
>>>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>> .
>>>>> This is version 1.0, while the latest is 1.5:
>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
>>>>> appeared that some links from the latest doc version point to 1.0 doc
>>>>> version. I fixed that in several places where I found that. Do we
>> really
>>>>> need old doc versions (1.0 -1.4)?
>>>>>
>>>>> 2) our documentation (
>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
>>>>> provide any special setup instructions to configure HDFS as secondary
>>> file
>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>> with
>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
>> It
>>>>> looks like the page
>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should be
>>>>> more
>>>>> clear regarding the required configuration steps (in fact, setting up
>>>>> HADOOP_HOME variable for Ignite node process).
>>>>>
>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>> conditions
>>>>> are met:
>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>> edition).
>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>> Hadoop
>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>> Hadoop
>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>>
>>>>> The exact mechanism of the Hadoop classpath composition can be found
>> in
>>>>> files
>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>>
>>>>> The issue is discussed in
>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>>
>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> Igniters,
>>>>>>
>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>>
>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
>>>>>> Hadoop
>>>>>> JARs, assuming that user will include them from the Hadoop
>> distribution
>>>>>> he
>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
>>> run
>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>> HDFS
>>>>>> as
>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
>>>>>> user
>>>>>> supposed to add them manually?
>>>>>>
>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>> this? I
>>>>>> believe there is not enough documentation on this topic.
>>>>>>
>>>>>> BTW, any ideas why user gets exception for JobConf class which is in
>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>>
>>>
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

dsetrakyan
On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> wrote:

> Yes, this will be documented tomorrow. I want to go though all the steps
> by myself checking all other possible obstacles the user may face with.
>

Thanks, Denis!


>
> —
> Denis
>
> > On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]>
> wrote:
> >
> > Ivan, I think this should be documented, no?
> >
> > On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]>
> wrote:
> >
> >> To enable just an IGFS persistence there is no need to use HDFS (this
> >> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> >> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >> persistence upon local file system, and we already close to  the
> solution.
> >>
> >> Regarding the secondary Fs doc page (
> >> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> >> suggest to add the following text there:
> >> ------------------------
> >> If Ignite node with secondary file system configured on a machine with
> >> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
> >> libraries: set HADOOP_HOME environment variable for the Ignite process
> if
> >> you're using Apache Hadoop distribution, or, if you use another
> >> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
> >> file exists and has appropriate contents.
> >>
> >> If Ignite node with secondary file system configured on a machine
> without
> >> Hadoop distribution, you can manually add necessary Hadoop dependencies
> to
> >> Ignite node classpath: these are dependencies of groupId
> >> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
> they
> >> are:
> >>
> >>   1. hadoop-annotations
> >>   2. hadoop-auth
> >>   3. hadoop-common
> >>   4. hadoop-hdfs
> >>   5. hadoop-mapreduce-client-common
> >>   6. hadoop-mapreduce-client-core
> >>
> >> ------------------------
> >>
> >> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >> [hidden email]> wrote:
> >>
> >>> Guys,
> >>>
> >>> Why don't we include ignite-hadoop module in Fabric? This user simply
> >> wants
> >>> to configure HDFS as a secondary file system to ensure persistence. Not
> >>> having the opportunity to do this in Fabric looks weird to me. And
> >> actually
> >>> I don't think this is a use case for Hadoop Accelerator.
> >>>
> >>> -Val
> >>>
> >>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
> >> wrote:
> >>>
> >>>> Hi Ivan,
> >>>>
> >>>> 1) Yes, I think that it makes sense to have the old versions of the
> >> docs
> >>>> while an old version is still considered to be used by someone.
> >>>>
> >>>> 2) Absolutely, the time to add a corresponding article on the
> >> readme.io
> >>>> has come. It's not the first time I see the question related to HDFS
> >> as a
> >>>> secondary FS.
> >>>> Before and now it's not clear for me what exact steps I should follow
> >> to
> >>>> enable such a configuration. Our current suggestions look like a
> >> puzzle.
> >>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
> >> you
> >>>> don't mind I would reaching you out directly asking for any technical
> >>>> assistance if needed.
> >>>>
> >>>> Regards,
> >>>> Denis
> >>>>
> >>>>
> >>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>
> >>>>> Hi, Valentin,
> >>>>>
> >>>>> 1) first of all note that the author of the question uses not the
> >> latest
> >>>>> doc page, namely
> >>>>>
> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >> .
> >>>>> This is version 1.0, while the latest is 1.5:
> >>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> >>>>> appeared that some links from the latest doc version point to 1.0 doc
> >>>>> version. I fixed that in several places where I found that. Do we
> >> really
> >>>>> need old doc versions (1.0 -1.4)?
> >>>>>
> >>>>> 2) our documentation (
> >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
> not
> >>>>> provide any special setup instructions to configure HDFS as secondary
> >>> file
> >>>>> system in Ignite. Our docs assume that if a user wants to integrate
> >> with
> >>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> >>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
> >> It
> >>>>> looks like the page
> >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
> be
> >>>>> more
> >>>>> clear regarding the required configuration steps (in fact, setting up
> >>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>
> >>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >> conditions
> >>>>> are met:
> >>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >> edition).
> >>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
> >> Hadoop
> >>>>> distribution), or file "/etc/default/hadoop" exists and matches the
> >>> Hadoop
> >>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>
> >>>>> The exact mechanism of the Hadoop classpath composition can be found
> >> in
> >>>>> files
> >>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>
> >>>>> The issue is discussed in
> >>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>
> >>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>> [hidden email]> wrote:
> >>>>>
> >>>>> Igniters,
> >>>>>>
> >>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>>>>
> >>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> >>>>>> Hadoop
> >>>>>> JARs, assuming that user will include them from the Hadoop
> >> distribution
> >>>>>> he
> >>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> >>> run
> >>>>>> mapreduce jobs, but I can't figure out steps required to configure
> >> HDFS
> >>>>>> as
> >>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
> Is
> >>>>>> user
> >>>>>> supposed to add them manually?
> >>>>>>
> >>>>>> Can someone with more expertise in our Hadoop integration clarify
> >>> this? I
> >>>>>> believe there is not enough documentation on this topic.
> >>>>>>
> >>>>>> BTW, any ideas why user gets exception for JobConf class which is in
> >>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>>>>>
> >>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Denis Magda
Ivan,

Is there any reason why we don’t recommend using apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop Accelerator articles?

With setup-hadoop.sh I was able to build a valid classpath, create symlinks to the accelerator's jars from hadoop’s libs folder automatically and started an Ignite node that uses HDFS as a secondary FS in less than 10 minutes.

I just followed the instructions from apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the readme.io <http://readme.io/> look much more complex for me, they don’t mention setup-hadoop.sh/bat at all making the end user to perform a manual setup.


Denis

> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]> wrote:
>
> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]> wrote:
>
>> Yes, this will be documented tomorrow. I want to go though all the steps
>> by myself checking all other possible obstacles the user may face with.
>>
>
> Thanks, Denis!
>
>
>>
>> —
>> Denis
>>
>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]>
>> wrote:
>>>
>>> Ivan, I think this should be documented, no?
>>>
>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]>
>> wrote:
>>>
>>>> To enable just an IGFS persistence there is no need to use HDFS (this
>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>>>> persistence upon local file system, and we already close to  the
>> solution.
>>>>
>>>> Regarding the secondary Fs doc page (
>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>>>> suggest to add the following text there:
>>>> ------------------------
>>>> If Ignite node with secondary file system configured on a machine with
>>>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
>>>> libraries: set HADOOP_HOME environment variable for the Ignite process
>> if
>>>> you're using Apache Hadoop distribution, or, if you use another
>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
>>>> file exists and has appropriate contents.
>>>>
>>>> If Ignite node with secondary file system configured on a machine
>> without
>>>> Hadoop distribution, you can manually add necessary Hadoop dependencies
>> to
>>>> Ignite node classpath: these are dependencies of groupId
>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
>> they
>>>> are:
>>>>
>>>>  1. hadoop-annotations
>>>>  2. hadoop-auth
>>>>  3. hadoop-common
>>>>  4. hadoop-hdfs
>>>>  5. hadoop-mapreduce-client-common
>>>>  6. hadoop-mapreduce-client-core
>>>>
>>>> ------------------------
>>>>
>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>>>> [hidden email]> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>>>> wants
>>>>> to configure HDFS as a secondary file system to ensure persistence. Not
>>>>> having the opportunity to do this in Fabric looks weird to me. And
>>>> actually
>>>>> I don't think this is a use case for Hadoop Accelerator.
>>>>>
>>>>> -Val
>>>>>
>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
>>>> wrote:
>>>>>
>>>>>> Hi Ivan,
>>>>>>
>>>>>> 1) Yes, I think that it makes sense to have the old versions of the
>>>> docs
>>>>>> while an old version is still considered to be used by someone.
>>>>>>
>>>>>> 2) Absolutely, the time to add a corresponding article on the
>>>> readme.io
>>>>>> has come. It's not the first time I see the question related to HDFS
>>>> as a
>>>>>> secondary FS.
>>>>>> Before and now it's not clear for me what exact steps I should follow
>>>> to
>>>>>> enable such a configuration. Our current suggestions look like a
>>>> puzzle.
>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>>>> you
>>>>>> don't mind I would reaching you out directly asking for any technical
>>>>>> assistance if needed.
>>>>>>
>>>>>> Regards,
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>>>>
>>>>>>> Hi, Valentin,
>>>>>>>
>>>>>>> 1) first of all note that the author of the question uses not the
>>>> latest
>>>>>>> doc page, namely
>>>>>>>
>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>>>> .
>>>>>>> This is version 1.0, while the latest is 1.5:
>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
>>>>>>> appeared that some links from the latest doc version point to 1.0 doc
>>>>>>> version. I fixed that in several places where I found that. Do we
>>>> really
>>>>>>> need old doc versions (1.0 -1.4)?
>>>>>>>
>>>>>>> 2) our documentation (
>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
>> not
>>>>>>> provide any special setup instructions to configure HDFS as secondary
>>>>> file
>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>>>> with
>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
>>>> It
>>>>>>> looks like the page
>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
>> be
>>>>>>> more
>>>>>>> clear regarding the required configuration steps (in fact, setting up
>>>>>>> HADOOP_HOME variable for Ignite node process).
>>>>>>>
>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>>>> conditions
>>>>>>> are met:
>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>>>> edition).
>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>>>> Hadoop
>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>>>> Hadoop
>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>>>>
>>>>>>> The exact mechanism of the Hadoop classpath composition can be found
>>>> in
>>>>>>> files
>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>>>>
>>>>>>> The issue is discussed in
>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>>>>
>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> Igniters,
>>>>>>>>
>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>>>>
>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
>>>>>>>> Hadoop
>>>>>>>> JARs, assuming that user will include them from the Hadoop
>>>> distribution
>>>>>>>> he
>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
>>>>> run
>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>>>> HDFS
>>>>>>>> as
>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
>> Is
>>>>>>>> user
>>>>>>>> supposed to add them manually?
>>>>>>>>
>>>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>>>> this? I
>>>>>>>> believe there is not enough documentation on this topic.
>>>>>>>>
>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is in
>>>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>>>>
>>>>>>>> -Val
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Ivan V.
Denis, good question.
Yes, there are several reasons.
1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all
others (e.g. BigTop)
2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
what prevents further cluster usage without Ignite.
3) setup-hadoop needs write permission to all the folders it writes files
to.
4) It is possible to provide all the required functionality without any
file modifications in the existing Hadoop cluster at all, see
https://issues.apache.org/jira/browse/IGNITE-483.

There were plans to remove "setup-hadoop", but that is not yet done.
In any way, I 100% agree that presence of several different versions of the
documentation is quite confusing and misleading.


On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]> wrote:

> Ivan,
>
> Is there any reason why we don’t recommend using
> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
> Accelerator articles?
>
> With setup-hadoop.sh I was able to build a valid classpath, create
> symlinks to the accelerator's jars from hadoop’s libs folder automatically
> and started an Ignite node that uses HDFS as a secondary FS in less than 10
> minutes.
>
> I just followed the instructions from
> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
> readme.io <http://readme.io/> look much more complex for me, they don’t
> mention setup-hadoop.sh/bat at all making the end user to perform a
> manual setup.
>
> —
> Denis
>
> > On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]>
> wrote:
> >
> > On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]>
> wrote:
> >
> >> Yes, this will be documented tomorrow. I want to go though all the steps
> >> by myself checking all other possible obstacles the user may face with.
> >>
> >
> > Thanks, Denis!
> >
> >
> >>
> >> —
> >> Denis
> >>
> >>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]
> >
> >> wrote:
> >>>
> >>> Ivan, I think this should be documented, no?
> >>>
> >>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]>
> >> wrote:
> >>>
> >>>> To enable just an IGFS persistence there is no need to use HDFS (this
> >>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> >>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >>>> persistence upon local file system, and we already close to  the
> >> solution.
> >>>>
> >>>> Regarding the secondary Fs doc page (
> >>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> >>>> suggest to add the following text there:
> >>>> ------------------------
> >>>> If Ignite node with secondary file system configured on a machine with
> >>>> Hadoop distribution, make sure Ignite is able to find appropriate
> Hadoop
> >>>> libraries: set HADOOP_HOME environment variable for the Ignite process
> >> if
> >>>> you're using Apache Hadoop distribution, or, if you use another
> >>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
> /etc/default/hadoop
> >>>> file exists and has appropriate contents.
> >>>>
> >>>> If Ignite node with secondary file system configured on a machine
> >> without
> >>>> Hadoop distribution, you can manually add necessary Hadoop
> dependencies
> >> to
> >>>> Ignite node classpath: these are dependencies of groupId
> >>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
> >> they
> >>>> are:
> >>>>
> >>>>  1. hadoop-annotations
> >>>>  2. hadoop-auth
> >>>>  3. hadoop-common
> >>>>  4. hadoop-hdfs
> >>>>  5. hadoop-mapreduce-client-common
> >>>>  6. hadoop-mapreduce-client-core
> >>>>
> >>>> ------------------------
> >>>>
> >>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >>>> [hidden email]> wrote:
> >>>>
> >>>>> Guys,
> >>>>>
> >>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
> >>>> wants
> >>>>> to configure HDFS as a secondary file system to ensure persistence.
> Not
> >>>>> having the opportunity to do this in Fabric looks weird to me. And
> >>>> actually
> >>>>> I don't think this is a use case for Hadoop Accelerator.
> >>>>>
> >>>>> -Val
> >>>>>
> >>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
> >>>> wrote:
> >>>>>
> >>>>>> Hi Ivan,
> >>>>>>
> >>>>>> 1) Yes, I think that it makes sense to have the old versions of the
> >>>> docs
> >>>>>> while an old version is still considered to be used by someone.
> >>>>>>
> >>>>>> 2) Absolutely, the time to add a corresponding article on the
> >>>> readme.io
> >>>>>> has come. It's not the first time I see the question related to HDFS
> >>>> as a
> >>>>>> secondary FS.
> >>>>>> Before and now it's not clear for me what exact steps I should
> follow
> >>>> to
> >>>>>> enable such a configuration. Our current suggestions look like a
> >>>> puzzle.
> >>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
> >>>> you
> >>>>>> don't mind I would reaching you out directly asking for any
> technical
> >>>>>> assistance if needed.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Denis
> >>>>>>
> >>>>>>
> >>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>>>
> >>>>>>> Hi, Valentin,
> >>>>>>>
> >>>>>>> 1) first of all note that the author of the question uses not the
> >>>> latest
> >>>>>>> doc page, namely
> >>>>>>>
> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >>>> .
> >>>>>>> This is version 1.0, while the latest is 1.5:
> >>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides,
> it
> >>>>>>> appeared that some links from the latest doc version point to 1.0
> doc
> >>>>>>> version. I fixed that in several places where I found that. Do we
> >>>> really
> >>>>>>> need old doc versions (1.0 -1.4)?
> >>>>>>>
> >>>>>>> 2) our documentation (
> >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
> >> not
> >>>>>>> provide any special setup instructions to configure HDFS as
> secondary
> >>>>> file
> >>>>>>> system in Ignite. Our docs assume that if a user wants to integrate
> >>>> with
> >>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> >>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
> ).
> >>>> It
> >>>>>>> looks like the page
> >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
> >> be
> >>>>>>> more
> >>>>>>> clear regarding the required configuration steps (in fact, setting
> up
> >>>>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>>>
> >>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >>>> conditions
> >>>>>>> are met:
> >>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >>>> edition).
> >>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
> >>>> Hadoop
> >>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
> >>>>> Hadoop
> >>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>>>
> >>>>>>> The exact mechanism of the Hadoop classpath composition can be
> found
> >>>> in
> >>>>>>> files
> >>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>>>
> >>>>>>> The issue is discussed in
> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>>>
> >>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>>>> [hidden email]> wrote:
> >>>>>>>
> >>>>>>> Igniters,
> >>>>>>>>
> >>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>>>>>>
> >>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and
> without
> >>>>>>>> Hadoop
> >>>>>>>> JARs, assuming that user will include them from the Hadoop
> >>>> distribution
> >>>>>>>> he
> >>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop
> to
> >>>>> run
> >>>>>>>> mapreduce jobs, but I can't figure out steps required to configure
> >>>> HDFS
> >>>>>>>> as
> >>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
> >> Is
> >>>>>>>> user
> >>>>>>>> supposed to add them manually?
> >>>>>>>>
> >>>>>>>> Can someone with more expertise in our Hadoop integration clarify
> >>>>> this? I
> >>>>>>>> believe there is not enough documentation on this topic.
> >>>>>>>>
> >>>>>>>> BTW, any ideas why user gets exception for JobConf class which is
> in
> >>>>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>>>
> >>>>>>>> -Val
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Denis Magda
Hi Ivan,

Thanks for clarification.

Actually I’ve modified the content of the following pages:

- Added “Atomatic Hadoop Configuration” section that describes the usage of setup-hadoop with all its pros and cons for Apache Hadoop and CDH
http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration
http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh

- Provided more info on how to use ‘HDFS’ as a secondary file system for ‘IGFS’ using your yesterday answer and referring to the updated configuration guides
http://apacheignite.gridgain.org/docs/secondary-file-system

Please as an IGFS & Hadoop expert review my changes and edit them whenever required.

In addition I noted that we have a disabled and empty article for BigTop distribution. Is this OK?


Denis

> On 15 дек. 2015 г., at 12:10, Ivan V. <[hidden email]> wrote:
>
> Denis, good question.
> Yes, there are several reasons.
> 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all
> others (e.g. BigTop)
> 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
> what prevents further cluster usage without Ignite.
> 3) setup-hadoop needs write permission to all the folders it writes files
> to.
> 4) It is possible to provide all the required functionality without any
> file modifications in the existing Hadoop cluster at all, see
> https://issues.apache.org/jira/browse/IGNITE-483.
>
> There were plans to remove "setup-hadoop", but that is not yet done.
> In any way, I 100% agree that presence of several different versions of the
> documentation is quite confusing and misleading.
>
>
> On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]> wrote:
>
>> Ivan,
>>
>> Is there any reason why we don’t recommend using
>> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
>> Accelerator articles?
>>
>> With setup-hadoop.sh I was able to build a valid classpath, create
>> symlinks to the accelerator's jars from hadoop’s libs folder automatically
>> and started an Ignite node that uses HDFS as a secondary FS in less than 10
>> minutes.
>>
>> I just followed the instructions from
>> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
>> readme.io <http://readme.io/> look much more complex for me, they don’t
>> mention setup-hadoop.sh/bat at all making the end user to perform a
>> manual setup.
>>
>> —
>> Denis
>>
>>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]>
>> wrote:
>>>
>>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]>
>> wrote:
>>>
>>>> Yes, this will be documented tomorrow. I want to go though all the steps
>>>> by myself checking all other possible obstacles the user may face with.
>>>>
>>>
>>> Thanks, Denis!
>>>
>>>
>>>>
>>>> —
>>>> Denis
>>>>
>>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[hidden email]
>>>
>>>> wrote:
>>>>>
>>>>> Ivan, I think this should be documented, no?
>>>>>
>>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]>
>>>> wrote:
>>>>>
>>>>>> To enable just an IGFS persistence there is no need to use HDFS (this
>>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>>>>>> persistence upon local file system, and we already close to  the
>>>> solution.
>>>>>>
>>>>>> Regarding the secondary Fs doc page (
>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>>>>>> suggest to add the following text there:
>>>>>> ------------------------
>>>>>> If Ignite node with secondary file system configured on a machine with
>>>>>> Hadoop distribution, make sure Ignite is able to find appropriate
>> Hadoop
>>>>>> libraries: set HADOOP_HOME environment variable for the Ignite process
>>>> if
>>>>>> you're using Apache Hadoop distribution, or, if you use another
>>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
>> /etc/default/hadoop
>>>>>> file exists and has appropriate contents.
>>>>>>
>>>>>> If Ignite node with secondary file system configured on a machine
>>>> without
>>>>>> Hadoop distribution, you can manually add necessary Hadoop
>> dependencies
>>>> to
>>>>>> Ignite node classpath: these are dependencies of groupId
>>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
>>>> they
>>>>>> are:
>>>>>>
>>>>>> 1. hadoop-annotations
>>>>>> 2. hadoop-auth
>>>>>> 3. hadoop-common
>>>>>> 4. hadoop-hdfs
>>>>>> 5. hadoop-mapreduce-client-common
>>>>>> 6. hadoop-mapreduce-client-core
>>>>>>
>>>>>> ------------------------
>>>>>>
>>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>>>>>> wants
>>>>>>> to configure HDFS as a secondary file system to ensure persistence.
>> Not
>>>>>>> having the opportunity to do this in Fabric looks weird to me. And
>>>>>> actually
>>>>>>> I don't think this is a use case for Hadoop Accelerator.
>>>>>>>
>>>>>>> -Val
>>>>>>>
>>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ivan,
>>>>>>>>
>>>>>>>> 1) Yes, I think that it makes sense to have the old versions of the
>>>>>> docs
>>>>>>>> while an old version is still considered to be used by someone.
>>>>>>>>
>>>>>>>> 2) Absolutely, the time to add a corresponding article on the
>>>>>> readme.io
>>>>>>>> has come. It's not the first time I see the question related to HDFS
>>>>>> as a
>>>>>>>> secondary FS.
>>>>>>>> Before and now it's not clear for me what exact steps I should
>> follow
>>>>>> to
>>>>>>>> enable such a configuration. Our current suggestions look like a
>>>>>> puzzle.
>>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>>>>>> you
>>>>>>>> don't mind I would reaching you out directly asking for any
>> technical
>>>>>>>> assistance if needed.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>>>>>>
>>>>>>>>> Hi, Valentin,
>>>>>>>>>
>>>>>>>>> 1) first of all note that the author of the question uses not the
>>>>>> latest
>>>>>>>>> doc page, namely
>>>>>>>>>
>>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>>>>>> .
>>>>>>>>> This is version 1.0, while the latest is 1.5:
>>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides,
>> it
>>>>>>>>> appeared that some links from the latest doc version point to 1.0
>> doc
>>>>>>>>> version. I fixed that in several places where I found that. Do we
>>>>>> really
>>>>>>>>> need old doc versions (1.0 -1.4)?
>>>>>>>>>
>>>>>>>>> 2) our documentation (
>>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
>>>> not
>>>>>>>>> provide any special setup instructions to configure HDFS as
>> secondary
>>>>>>> file
>>>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>>>>>> with
>>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
>> ).
>>>>>> It
>>>>>>>>> looks like the page
>>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
>>>> be
>>>>>>>>> more
>>>>>>>>> clear regarding the required configuration steps (in fact, setting
>> up
>>>>>>>>> HADOOP_HOME variable for Ignite node process).
>>>>>>>>>
>>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>>>>>> conditions
>>>>>>>>> are met:
>>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>>>>>> edition).
>>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>>>>>> Hadoop
>>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>>>>>> Hadoop
>>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>>>>>>
>>>>>>>>> The exact mechanism of the Hadoop classpath composition can be
>> found
>>>>>> in
>>>>>>>>> files
>>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>>>>>>
>>>>>>>>> The issue is discussed in
>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>>>>>>
>>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Igniters,
>>>>>>>>>>
>>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>>>>>>
>>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and
>> without
>>>>>>>>>> Hadoop
>>>>>>>>>> JARs, assuming that user will include them from the Hadoop
>>>>>> distribution
>>>>>>>>>> he
>>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop
>> to
>>>>>>> run
>>>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>>>>>> HDFS
>>>>>>>>>> as
>>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
>>>> Is
>>>>>>>>>> user
>>>>>>>>>> supposed to add them manually?
>>>>>>>>>>
>>>>>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>>>>>> this? I
>>>>>>>>>> believe there is not enough documentation on this topic.
>>>>>>>>>>
>>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is
>> in
>>>>>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>>>>>>
>>>>>>>>>> -Val
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Ivan V.
Hi, Denis,
1) my opinion is that we'd better not mention 'setup-hadoop' script at all
(for the reasons mentioned above) and delete it in the nearest release.
2) Now Ignite is a part of BigTop distribution (see
https://issues.apache.org/jira/browse/IGNITE-665), so the old BigTop
instruction is not relevant any more. I guess, this is the reason.


On Tue, Dec 15, 2015 at 12:35 PM, Denis Magda <[hidden email]> wrote:

> Hi Ivan,
>
> Thanks for clarification.
>
> Actually I’ve modified the content of the following pages:
>
> - Added “Atomatic Hadoop Configuration” section that describes the usage
> of setup-hadoop with all its pros and cons for Apache Hadoop and CDH
>
> http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration
> http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh
>
> - Provided more info on how to use ‘HDFS’ as a secondary file system for
> ‘IGFS’ using your yesterday answer and referring to the updated
> configuration guides
> http://apacheignite.gridgain.org/docs/secondary-file-system
>
> Please as an IGFS & Hadoop expert review my changes and edit them whenever
> required.
>
> In addition I noted that we have a disabled and empty article for BigTop
> distribution. Is this OK?
>
> —
> Denis
>
> > On 15 дек. 2015 г., at 12:10, Ivan V. <[hidden email]> wrote:
> >
> > Denis, good question.
> > Yes, there are several reasons.
> > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for
> all
> > others (e.g. BigTop)
> > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
> > what prevents further cluster usage without Ignite.
> > 3) setup-hadoop needs write permission to all the folders it writes files
> > to.
> > 4) It is possible to provide all the required functionality without any
> > file modifications in the existing Hadoop cluster at all, see
> > https://issues.apache.org/jira/browse/IGNITE-483.
> >
> > There were plans to remove "setup-hadoop", but that is not yet done.
> > In any way, I 100% agree that presence of several different versions of
> the
> > documentation is quite confusing and misleading.
> >
> >
> > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[hidden email]>
> wrote:
> >
> >> Ivan,
> >>
> >> Is there any reason why we don’t recommend using
> >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
> >> Accelerator articles?
> >>
> >> With setup-hadoop.sh I was able to build a valid classpath, create
> >> symlinks to the accelerator's jars from hadoop’s libs folder
> automatically
> >> and started an Ignite node that uses HDFS as a secondary FS in less
> than 10
> >> minutes.
> >>
> >> I just followed the instructions from
> >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
> >> readme.io <http://readme.io/> look much more complex for me, they don’t
> >> mention setup-hadoop.sh/bat at all making the end user to perform a
> >> manual setup.
> >>
> >> —
> >> Denis
> >>
> >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[hidden email]
> >
> >> wrote:
> >>>
> >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[hidden email]>
> >> wrote:
> >>>
> >>>> Yes, this will be documented tomorrow. I want to go though all the
> steps
> >>>> by myself checking all other possible obstacles the user may face
> with.
> >>>>
> >>>
> >>> Thanks, Denis!
> >>>
> >>>
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <
> [hidden email]
> >>>
> >>>> wrote:
> >>>>>
> >>>>> Ivan, I think this should be documented, no?
> >>>>>
> >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[hidden email]>
> >>>> wrote:
> >>>>>
> >>>>>> To enable just an IGFS persistence there is no need to use HDFS
> (this
> >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120
> ,
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >>>>>> persistence upon local file system, and we already close to  the
> >>>> solution.
> >>>>>>
> >>>>>> Regarding the secondary Fs doc page (
> >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I
> would
> >>>>>> suggest to add the following text there:
> >>>>>> ------------------------
> >>>>>> If Ignite node with secondary file system configured on a machine
> with
> >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate
> >> Hadoop
> >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite
> process
> >>>> if
> >>>>>> you're using Apache Hadoop distribution, or, if you use another
> >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
> >> /etc/default/hadoop
> >>>>>> file exists and has appropriate contents.
> >>>>>>
> >>>>>> If Ignite node with secondary file system configured on a machine
> >>>> without
> >>>>>> Hadoop distribution, you can manually add necessary Hadoop
> >> dependencies
> >>>> to
> >>>>>> Ignite node classpath: these are dependencies of groupId
> >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml .
> Currently
> >>>> they
> >>>>>> are:
> >>>>>>
> >>>>>> 1. hadoop-annotations
> >>>>>> 2. hadoop-auth
> >>>>>> 3. hadoop-common
> >>>>>> 4. hadoop-hdfs
> >>>>>> 5. hadoop-mapreduce-client-common
> >>>>>> 6. hadoop-mapreduce-client-core
> >>>>>>
> >>>>>> ------------------------
> >>>>>>
> >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >>>>>> [hidden email]> wrote:
> >>>>>>
> >>>>>>> Guys,
> >>>>>>>
> >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user
> simply
> >>>>>> wants
> >>>>>>> to configure HDFS as a secondary file system to ensure persistence.
> >> Not
> >>>>>>> having the opportunity to do this in Fabric looks weird to me. And
> >>>>>> actually
> >>>>>>> I don't think this is a use case for Hadoop Accelerator.
> >>>>>>>
> >>>>>>> -Val
> >>>>>>>
> >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[hidden email]
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Ivan,
> >>>>>>>>
> >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of
> the
> >>>>>> docs
> >>>>>>>> while an old version is still considered to be used by someone.
> >>>>>>>>
> >>>>>>>> 2) Absolutely, the time to add a corresponding article on the
> >>>>>> readme.io
> >>>>>>>> has come. It's not the first time I see the question related to
> HDFS
> >>>>>> as a
> >>>>>>>> secondary FS.
> >>>>>>>> Before and now it's not clear for me what exact steps I should
> >> follow
> >>>>>> to
> >>>>>>>> enable such a configuration. Our current suggestions look like a
> >>>>>> puzzle.
> >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan
> if
> >>>>>> you
> >>>>>>>> don't mind I would reaching you out directly asking for any
> >> technical
> >>>>>>>> assistance if needed.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Denis
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>>>>>
> >>>>>>>>> Hi, Valentin,
> >>>>>>>>>
> >>>>>>>>> 1) first of all note that the author of the question uses not the
> >>>>>> latest
> >>>>>>>>> doc page, namely
> >>>>>>>>>
> >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >>>>>> .
> >>>>>>>>> This is version 1.0, while the latest is 1.5:
> >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides,
> >> it
> >>>>>>>>> appeared that some links from the latest doc version point to 1.0
> >> doc
> >>>>>>>>> version. I fixed that in several places where I found that. Do we
> >>>>>> really
> >>>>>>>>> need old doc versions (1.0 -1.4)?
> >>>>>>>>>
> >>>>>>>>> 2) our documentation (
> >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system)
> does
> >>>> not
> >>>>>>>>> provide any special setup instructions to configure HDFS as
> >> secondary
> >>>>>>> file
> >>>>>>>>> system in Ignite. Our docs assume that if a user wants to
> integrate
> >>>>>> with
> >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction
> (e.g.
> >>>>>>>>>
> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
> >> ).
> >>>>>> It
> >>>>>>>>> looks like the page
> >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system
> should
> >>>> be
> >>>>>>>>> more
> >>>>>>>>> clear regarding the required configuration steps (in fact,
> setting
> >> up
> >>>>>>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>>>>>
> >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >>>>>> conditions
> >>>>>>>>> are met:
> >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >>>>>> edition).
> >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
> >>>>>> Hadoop
> >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches
> the
> >>>>>>> Hadoop
> >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>>>>>
> >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be
> >> found
> >>>>>> in
> >>>>>>>>> files
> >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>>>>>
> >>>>>>>>> The issue is discussed in
> >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>>>>>
> >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>
> >>>>>>>>> Igniters,
> >>>>>>>>>>
> >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>>>>>>>>
> >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and
> >> without
> >>>>>>>>>> Hadoop
> >>>>>>>>>> JARs, assuming that user will include them from the Hadoop
> >>>>>> distribution
> >>>>>>>>>> he
> >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to
> Hadoop
> >> to
> >>>>>>> run
> >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to
> configure
> >>>>>> HDFS
> >>>>>>>>>> as
> >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on
> classpath?
> >>>> Is
> >>>>>>>>>> user
> >>>>>>>>>> supposed to add them manually?
> >>>>>>>>>>
> >>>>>>>>>> Can someone with more expertise in our Hadoop integration
> clarify
> >>>>>>> this? I
> >>>>>>>>>> believe there is not enough documentation on this topic.
> >>>>>>>>>>
> >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which
> is
> >> in
> >>>>>>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>>>>>
> >>>>>>>>>> -Val
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using HDFS as a secondary FS

Konstantin Boudnik-2
In reply to this post by Valentin Kulichenko
The integration with external systems like HDFS is a complex topics and should
be generally solved at the level of the software that has no control over a
user's environment (yes, I am talking about Igite). In Bigtop we are doing a
lot of this stuff, including the guarantees that version of HDFS, Ignite has
been built against, will be in the cluster, etc.

Generally speaking, if someone rejects to use orchestration and deployment
software similar to Bigtop, finding the correct libs is their own
responsibility. I would advise not to load extra modules nor to redistribute
libs from another project, just to solve someone's inability to correctly
configure their own cluster.

Cos

On Fri, Dec 11, 2015 at 04:45PM, Valentin Kulichenko wrote:

> Igniters,
>
> I'm looking at the question on SO [1] and I'm a bit confused.
>
> We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop
> JARs, assuming that user will include them from the Hadoop distribution he
> uses. It seems OK for me when accelerator is plugged in to Hadoop to run
> mapreduce jobs, but I can't figure out steps required to configure HDFS as
> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user
> supposed to add them manually?
>
> Can someone with more expertise in our Hadoop integration clarify this? I
> believe there is not enough documentation on this topic.
>
> BTW, any ideas why user gets exception for JobConf class which is in
> 'mapred' package? Why map-reduce class is being used?
>
> [1]
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>
> -Val

signature.asc (237 bytes) Download Attachment