DatasetReader

class snap_ml_spark.DatasetReader.DatasetReader

Load distributed dataset from file.

load(file)

Load training data in memory

Parameters

file (string) – filename

setFormat(format)

Specify the dataformat of the file. Format values: “snap” or “libsvm” or “csv”

Parameters

format (string) – data format

setNumFt(x)

Set the number of features

Parameters

x (int) – index

takeRange(idx_start, idx_end)

If not the whole dataset should be loaded specify start and end index.

Parameters
  • idx_start (int) – first sample to load

  • idx_end (int) – last sample to load