LogisticRegression¶
-
class
snap_ml_spark.estimator.
LogisticRegression
(featuresCol='features', labelCol='label', predictionCol='prediction', trainingHistory=0, maxIter=1000, regParam=1.0, elasticNetParam=0.0, probabilityCol='probability', rawPredictionCol='rawPrediction', family='auto', tol=0.001, useGpu=False, dual=True, balanced=False, nthreads=-1, gpuMemLimit=0, verbose=False)¶ Logistic regression. This class supports multinomial logistic (softmax) and binomial logistic regression.
>>> from pyspark.sql import Row >>> from pyspark.ml.linalg import Vectors >>> bdf = sc.parallelize([ ... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)), ... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)), ... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)), ... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF() >>> blor = LogisticRegression(regParam=0.01, weightCol="weight") >>> blorModel = blor.fit(bdf) >>> blorModel.coefficients DenseVector([-1.080..., -0.646...]) >>> blorModel.intercept 3.112... >>> data_path = "data/mllib/sample_multiclass_classification_data.txt" >>> mdf = spark.read.format("libsvm").load(data_path) >>> mlor = LogisticRegression(regParam=0.1, elasticNetParam=1.0, family="multinomial") >>> mlorModel = mlor.fit(mdf) >>> mlorModel.coefficientMatrix SparseMatrix(3, 4, [0, 1, 2, 3], [3, 2, 1], [1.87..., -2.75..., -0.50...], 1) >>> mlorModel.interceptVector DenseVector([0.04..., -0.42..., 0.37...]) >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0, 1.0))]).toDF() >>> result = blorModel.transform(test0).head() >>> result.prediction 1.0 >>> result.probability DenseVector([0.02..., 0.97...]) >>> result.rawPrediction DenseVector([-3.54..., 3.54...]) >>> test1 = sc.parallelize([Row(features=Vectors.sparse(2, [0], [1.0]))]).toDF() >>> blorModel.transform(test1).head().prediction 1.0 >>> blor.setParams("vector") Traceback (most recent call last): ... TypeError: Method setParams forces keyword arguments. >>> lr_path = temp_path + "/lr" >>> blor.save(lr_path) >>> lr2 = LogisticRegression.load(lr_path) >>> lr2.getRegParam() 0.01 >>> model_path = temp_path + "/lr_model" >>> blorModel.save(model_path) >>> model2 = LogisticRegressionModel.load(model_path) >>> blorModel.coefficients[0] == model2.coefficients[0] True >>> blorModel.intercept == model2.intercept True
-
getBalanced
()¶ Gets the value of balanced or its default value.
-
getDual
()¶ Gets the value of dual or its default value.
-
getFamily
()¶ Gets the value of
family
or its default value.
-
getGpuMemLimit
()¶ Gets the value of gpuMemLimit or its default value.
-
getNthreads
()¶ Gets the value of nthreads or its default value.
-
getTrainingHistory
()¶ Gets the value of trainingHistory or its default value.
-
getUseGpu
()¶ Gets the value of useGpu or its default value.
-
getVerbose
()¶ Gets the value of verbose or its default value.
-
setBalanced
(value)¶ Sets the value of
balanced
.
-
setDual
(value)¶ Sets the value of
dual
.
-
setFamily
(value)¶ Sets the value of
family
.
-
setGpuMemLimit
(value)¶ Sets the value of
gpuMemLimit
.
-
setNthreads
(value)¶ Sets the value of
nthreads
.
-
setParams
(featuresCol='features', labelCol='label', predictionCol='prediction', trainingHistory=0, maxIter=1000, regParam=1.0, elasticNetParam=0.0, tol=0.001, probabilityCol='probability', rawPredictionCol='rawPrediction', family='auto', useGpu=False, dual=True, balanced=False, nthreads=1, gpuMemLimit=0, verbose=False)¶ setParams(self, featuresCol=”features”, labelCol=”label”, predictionCol=”prediction”, trainingHistory=0 maxIter=1000, regParam=1.0, elasticNetParam=0.0, tol=1e-3, probabilityCol=”probability”, rawPredictionCol=”rawPrediction”, family=”auto”, useGpu=False, dual=True, balanced=False, nthreads=-1, gpuMemLimit=0, verbose=False): Sets params for logistic regression. If the threshold and thresholds Params are both set, they must be equivalent.
-
setTrainingHistory
(value)¶ Sets the value of
trainingHistory
.
-
setUseGpu
(value)¶ Sets the value of
useGpu
.
-
setVerbose
(value)¶ Sets the value of
verbose
.
-