DatasetReader

class snap_ml_spark.DatasetReader.DatasetReader

Load distributed dataset from file.

load(file)

Load training data in memory

Parameters:file (string) – filename
setFormat(format)

Specify the dataformat of the file. Format values: “snap” or “libsvm” or “csv”

Parameters:format (string) – data format
setNumFt(x)

Set the number of features

Parameters:x (int) – index
takeRange(idx_start, idx_end)

If not the whole dataset should be loaded specify start and end index.

Parameters:
  • idx_start (int) – first sample to load
  • idx_end (int) – last sample to load