Folks,
In current Hadoop Accelerator design we always process user jobs in a separate classloader called HadoopClassLoader. It is somewhat special because it always loads Hadoop classes from scratch. This leads to at least two serious problems: 1) Very high permgen/metaspace load. Workaround - more permgen. 2) Native Hadoop libraries cannot be used. There are quire a few native methods in Hadoop. Corresponding dll/so files are loaded in static class initializers. As each HadoopClassLoader loads classes over and over again, libraries are loaded several times as well. But Java do not allow several loads of the same native library from different classloader. Result -> JNI linkage errors. For instance, this affects Snappy compress/decompress library which is pretty important in Hadoop ecosystem. Clearly, this isolation with custom class loader was done on purpose. And I understand why it is important, for example, for user-defined classes. But why do we load Hadoop classes (e.g. org.apache.hadoop.fs.FileSystem) multiple times? Does any one has a clue? Vladimir. |
Initially this was done to support multithreaded model of running tasks vs
the original multiprocess model of Hadoop. And as far as I remember there were attempts to reuse Hadoop classes but they failed. As for high permgen load, it should not actually be that high, number of classloaders should be something slightly higher than number of concurrently running tasks. Task classloaders are getting pooled and reused. As for native code, I'm not sure what can be done here, I think environments like OSGi (Eclipse, etc..) have the same issue, may be we can look what they do in case of native dependencies in bundles? Sergi 2015-12-24 9:09 GMT+03:00 Vladimir Ozerov <[hidden email]>: > Folks, > > In current Hadoop Accelerator design we always process user jobs in a > separate classloader called HadoopClassLoader. It is somewhat special > because it always loads Hadoop classes from scratch. > > This leads to at least two serious problems: > 1) Very high permgen/metaspace load. Workaround - more permgen. > 2) Native Hadoop libraries cannot be used. There are quire a few native > methods in Hadoop. Corresponding dll/so files are loaded in static class > initializers. As each HadoopClassLoader loads classes over and over again, > libraries are loaded several times as well. But Java do not allow several > loads of the same native library from different classloader. Result -> JNI > linkage errors. For instance, this affects Snappy compress/decompress > library which is pretty important in Hadoop ecosystem. > > Clearly, this isolation with custom class loader was done on purpose. And I > understand why it is important, for example, for user-defined classes. > > But why do we load Hadoop classes (e.g. org.apache.hadoop.fs.FileSystem) > multiple times? Does any one has a clue? > > Vladimir. > |
Free forum by Nabble | Edit this page |