Fwd: NPE exception in KMeansTrainer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: NPE exception in KMeansTrainer

dmagda
Hey, ML experts,

Here is an ML issue reported. Please have a look.

--
Denis

---------- Forwarded message ---------
From: DocDVZ <[hidden email]>
Date: Mon, Aug 20, 2018 at 10:53 AM
Subject: NPE exception in KMeansTrainer
To: <[hidden email]>


Hello,

Since I'm new to data science, I'm not really sure if it's a bug or wrong
incoming data, so I decided to ask here for advice before submitting a
ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k
features. So I copy-pasted some lines from example:

        IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
        KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L);
        KMeansModel mdl = trainer.fit(
                ignite,
                dataCache,
                (k, v) -> Arrays.copyOfRange(v, 1, v.length),
                (k, v) -> v[0]
        );

But this leads to a NullPointerException in KMeansTrainer.class:

Caused by: java.lang.NullPointerException
        at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190)
        at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158)
        at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122)
        at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102)
        at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156)
        at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186)
        at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86)


at line:

        List<LabeledVector> rndPnts = dataset.compute(data -> {
            List<LabeledVector> rndPnt = new ArrayList<>();
            rndPnt.add(data.getRow(new
Random(seed).nextInt(data.rowSize())));
            return rndPnt;
        }, (a, b) -> a == null ? b : Stream.concat(a.stream(),
b.stream()).collect(Collectors.toList()));

The reducer receives null value for b and since there's no check for null,
b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for
me, is there any ways to workaround this issue?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NPE exception in KMeansTrainer

Alexey Zinoviev
This post was updated on .
Thank you, I think I've found this bug (or related to this) here
https://issues.apache.org/jira/browse/IGNITE-9239
It will be delivered in 2.7 (Currently it's in master branch).

But the NPE could happen in the current fix too, thanks a lot.
I'll try to make a fix in a few days and put the result into the master branch (and you could play with updated KMeans Trainer)

To be sure 100% that the bug is closed, could @DocDVZ provide an approach/code snippet of
cache populating?

I mean this cache
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);

Thank you.