LogisticRegression

class snap_ml_spark.estimator.LogisticRegression(featuresCol='features', labelCol='label', predictionCol='prediction', trainingHistory=0, maxIter=1000, regParam=1.0, elasticNetParam=0.0, probabilityCol='probability', rawPredictionCol='rawPrediction', family='auto', tol=0.001, useGpu=False, dual=True, balanced=False, nthreads=-1, gpuMemLimit=0, verbose=False)

Logistic regression. This class supports multinomial logistic (softmax) and binomial logistic regression.

>>> from pyspark.sql import Row
>>> from pyspark.ml.linalg import Vectors
>>> bdf = sc.parallelize([
...     Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)),
...     Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)),
...     Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)),
...     Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF()
>>> blor = LogisticRegression(regParam=0.01, weightCol="weight")
>>> blorModel = blor.fit(bdf)
>>> blorModel.coefficients
DenseVector([-1.080..., -0.646...])
>>> blorModel.intercept
3.112...
>>> data_path = "data/mllib/sample_multiclass_classification_data.txt"
>>> mdf = spark.read.format("libsvm").load(data_path)
>>> mlor = LogisticRegression(regParam=0.1, elasticNetParam=1.0, family="multinomial")
>>> mlorModel = mlor.fit(mdf)
>>> mlorModel.coefficientMatrix
SparseMatrix(3, 4, [0, 1, 2, 3], [3, 2, 1], [1.87..., -2.75..., -0.50...], 1)
>>> mlorModel.interceptVector
DenseVector([0.04..., -0.42..., 0.37...])
>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0, 1.0))]).toDF()
>>> result = blorModel.transform(test0).head()
>>> result.prediction
1.0
>>> result.probability
DenseVector([0.02..., 0.97...])
>>> result.rawPrediction
DenseVector([-3.54..., 3.54...])
>>> test1 = sc.parallelize([Row(features=Vectors.sparse(2, [0], [1.0]))]).toDF()
>>> blorModel.transform(test1).head().prediction
1.0
>>> blor.setParams("vector")
Traceback (most recent call last):
    ...
TypeError: Method setParams forces keyword arguments.
>>> lr_path = temp_path + "/lr"
>>> blor.save(lr_path)
>>> lr2 = LogisticRegression.load(lr_path)
>>> lr2.getRegParam()
0.01
>>> model_path = temp_path + "/lr_model"
>>> blorModel.save(model_path)
>>> model2 = LogisticRegressionModel.load(model_path)
>>> blorModel.coefficients[0] == model2.coefficients[0]
True
>>> blorModel.intercept == model2.intercept
True
getBalanced()

Gets the value of balanced or its default value.

getDual()

Gets the value of dual or its default value.

getFamily()

Gets the value of family or its default value.

getGpuMemLimit()

Gets the value of gpuMemLimit or its default value.

getNthreads()

Gets the value of nthreads or its default value.

getTrainingHistory()

Gets the value of trainingHistory or its default value.

getUseGpu()

Gets the value of useGpu or its default value.

getVerbose()

Gets the value of verbose or its default value.

setBalanced(value)

Sets the value of balanced.

setDual(value)

Sets the value of dual.

setFamily(value)

Sets the value of family.

setGpuMemLimit(value)

Sets the value of gpuMemLimit.

setNthreads(value)

Sets the value of nthreads.

setParams(featuresCol='features', labelCol='label', predictionCol='prediction', trainingHistory=0, maxIter=1000, regParam=1.0, elasticNetParam=0.0, tol=0.001, probabilityCol='probability', rawPredictionCol='rawPrediction', family='auto', useGpu=False, dual=True, balanced=False, nthreads=1, gpuMemLimit=0, verbose=False)

setParams(self, featuresCol=”features”, labelCol=”label”, predictionCol=”prediction”, trainingHistory=0 maxIter=1000, regParam=1.0, elasticNetParam=0.0, tol=1e-3, probabilityCol=”probability”, rawPredictionCol=”rawPrediction”, family=”auto”, useGpu=False, dual=True, balanced=False, nthreads=-1, gpuMemLimit=0, verbose=False): Sets params for logistic regression. If the threshold and thresholds Params are both set, they must be equivalent.

setTrainingHistory(value)

Sets the value of trainingHistory.

setUseGpu(value)

Sets the value of useGpu.

setVerbose(value)

Sets the value of verbose.