Deployment¶
When deploying your dashboard it is better not to use the built-in flask
development server but use a more robust production server like gunicorn
or waitress
.
Probably gunicorn is a bit more fully featured and
faster but only works on unix/linux/osx, whereas
waitress also works
on Windows and has very minimal dependencies.
Install with either pip install gunicorn
or pip install waitress
.
Storing explainer and running default dashboard with gunicorn¶
Before you start a dashboard with gunicorn you need to store both the explainer instance and and a configuration for the dashboard:
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
explainer = ClassifierExplainer(model, X, y)
db = ExplainerDashboard(explainer, title="Cool Title", shap_interaction=False)
db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True)
Now you re-load your dashboard and expose a flask server as app
in dashboard.py
:
from explainerdashboard import ExplainerDashboard
db = ExplainerDashboard.from_config("dashboard.yaml")
app = db.flask_server()
If you named the file above dashboard.py
, you can now start the gunicorn server with:
$ gunicorn dashboard:app
If you want to run the server server with for example three workers, binding to
port 8050
you launch gunicorn with:
$ gunicorn -w 3 -b localhost:8050 dashboard:app
If you now point your browser to http://localhost:8050
you should see your dashboard.
Next step is finding a nice url in your organization’s domain, and forwarding it
to your dashboard server.
With waitress you would call:
$ waitress-serve --port=8050 dashboard:app
Although you can all use the waitress
directly from the dashboard by passing
the use_waitress=True
flag to .run()
:
ExplainerDashboard(explainer).run(use_waitress=True)
Deploying dashboard as part of Flask app on specific route¶
Another way to deploy the dashboard is to first start a Flask
app, and then
use this app as the backend of the Dashboard, and host the dashboard on a specific
route. This way you can for example host multiple dashboard under different urls.
You need to pass the Flask server
instance and the url_base_pathname
to the
ExplainerDashboard
constructor, and then the dashboard itself can be found
under db.app.index
:
from flask import Flask
app = Flask(__name__)
[...]
db = ExplainerDashboard(explainer, server=app, url_base_pathname="/dashboard/")
@app.route('/dashboard')
def return_dashboard():
return db.app.index()
Now you can start the dashboard by:
$ gunicorn -b localhost:8050 dashboard:app
And you can visit the dashboard on http://localhost:8050/dashboard
.
Deploying to heroku¶
In case you would like to deploy to heroku (which is normally the simplest option for dash apps, see dash instructions here). The demonstration dashboard is also hosted on heroku at titanicexplainer.herokuapp.com.
In order to deploy the heroku there are a few things to keep in mind. First of
all you need to add explainerdashboard
and gunicorn
to
requirements.txt
(pinning is recommended to force a new build of your environment
whenever you upgrade versions):
explainerdashboard==0.3.1
gunicorn
Select a python runtime compatible with the version that you used to pickle
your explainer in runtime.txt
:
python-3.8.6
(supported versions as of this writing are python-3.9.0
, python-3.8.6
,
python-3.7.9
and python-3.6.12
, but check the
heroku documentation
for the latest)
And you need to tell heroku how to start your server in Procfile
:
web: gunicorn dashboard:app
Graphviz buildpack¶
If you want to visualize individual trees inside your RandomForest
or xgboost
model using the dtreeviz
package you will
need to make sure that graphviz
is installed on your heroku
dyno by
adding the following buildstack (as well as the python
buildpack):
https://github.com/weibeld/heroku-buildpack-graphviz.git
(you can add buildpacks through the “settings” page of your heroku project)
Docker deployment¶
You can also deploy a dashboard using docker. You can build the dashboard and store it inside the container to make sure it is compatible with the container environment. E.g. generate_dashboard.py:
from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import *
from explainerdashboard.datasets import *
X_train, y_train, X_test, y_test = titanic_survive()
model = RandomForestClassifier(n_estimators=50, max_depth=5).fit(X_train, y_train)
explainer = ClassifierExplainer(model, X_test, y_test,
cats=["Sex", 'Deck', 'Embarked'],
labels=['Not Survived', 'Survived'],
descriptions=feature_descriptions)
db = ExplainerDashboard(explainer)
db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True)
run_dashboard.py:
from explainerdashboard import ExplainerDashboard
db = ExplainerDashboard.from_config("dashboard.yaml")
db.run(host='0.0.0.0', port=9050, use_waitress=True)
Dockerfile:
FROM python:3.8
RUN pip install explainerdashboard
COPY generate_dashboard.py ./
COPY run_dashboard.py ./
RUN python generate_dashboard.py
EXPOSE 9050
CMD ["python", "./run_dashboard.py"]
And build and run the container exposing port 9050
:
$ docker build -t explainerdashboard .
$ docker run -p 9050:9050 explainerdashboard
Reducing memory usage¶
If you deploy the dashboard with a large dataset with a large number of rows (n
)
and a large number of columns (m
),
it can use up quite a bit of memory: the dataset itself, shap values,
shap interaction values and any other calculated properties are alle kept in
memory in order to make the dashboard responsive. You can check the (approximate)
memory usage with explainer.memory_usage()
. In order to reduce the memory
footprint there are a number of things you can do:
- Not including shap interaction tab.
Shap interaction values are shape
n*m*m
, so can take a subtantial amount of memory, especially if you have a significant amount of columnsm
.
- Setting a lower precision.
By default shap values are stored as
'float64'
, but you can store them as'float32'
instead and save half the space:`ClassifierExplainer(model, X_test, y_test, precision='float32')`
. You can also set a lower precision on yourX_test
dataset yourself ofcourse.
- Drop non-positive class shap values.
For multi class classifiers, by default
ClassifierExplainer
calculates shap values for all classes. If you are only interested in a single class you can drop the other shap values withexplainer.keep_shap_pos_label_only(pos_label)
- Storing row data externally and loading on the fly.
You can for example only store a subset of
10.000
rows in theexplainer
itself (enough to generate representative importance and dependence plots), and store the rest of your millions of rows of input data in an external file or database that get loaded one by one with the following functions:with
explainer.set_X_row_func()
you can set a function that takes an index as argument and returns a single row dataframe with model compatible input data for that index. This function can include a query to a database or fileread.with
explainer.set_y_func()
you can set a function that takes and index as argument and returns the observed outcomey
for that index.with
explainer.set_index_list_func()
you can set a function that returns a list of available indexes that can be queried.
If the number of indexes is too long to fit in a dropdown you can pass
index_dropdown=False
which turns the dropdowns into free text fields. Instead of anindex_list_func
you can also set anexplainer.set_index_check_func(func)
which should return a bool whether theindex
exists or not.Important: these function can be called multiple times by multiple independent components, so probably best to implement some kind of caching functionality. The functions you pass can be also methods, so you have access to all of the internals of the explainer.
Setting logins and password¶
ExplainerDashboard
supports dash basic auth functionality.
ExplainerHub
uses flask_simple_login
for its user authentication.
You can simply add a list of logins to the ExplainerDashboard
to force a login
and prevent random users from accessing the details of your model dashboard:
ExplainerDashboard(explainer, logins=[['login1', 'password1'], ['login2', 'password2']]).run()
Whereas ExplainerHub has somewhat more intricate user management
using FlaskLogin
, but the basic syntax is the same. See the
ExplainerHub documetation for more details:
hub = ExplainerHub([db1, db2], logins=[['login1', 'password1'], ['login2', 'password2']])
Make sure not to check these login/password pairs into version control though,
but store them somewhere safe! ExplainerHub
stores passwords into a hashed
format by default.
Automatically restart gunicorn server upon changes¶
We can use the explainerdashboard
CLI tools to automatically rebuild our
explainer whenever there is a change to the underlying
model, dataset or explainer configuration. And we we can use kill -HUP gunicorn.pid
to force the gunicorn to restart and reload whenever a new explainer.joblib
is generated or the dashboard configuration dashboard.yaml
changes. These two
processes together ensure that the dashboard automatically updates whenever there
are underlying changes.
First we store the explainer config in explainer.yaml
and the dashboard
config in dashboard.yaml
. We also indicate which modelfiles and datafiles the
explainer depends on, and which columns in the datafile should be used as
a target and which as index:
explainer = ClassifierExplainer(model, X, y, labels=['Not Survived', 'Survived'])
explainer.dump("explainer.joblib")
explainer.to_yaml("explainer.yaml",
modelfile="model.pkl",
datafile="data.csv",
index_col="Name",
target_col="Survival",
explainerfile="explainer.joblib",
dashboard_yaml="dashboard.yaml")
db = ExplainerDashboard(explainer, [ShapDependenceTab, ImportancesTab], title="Custom Title")
db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib")
The dashboard.py
is the same as before and simply loads an ExplainerDashboard
directly from the config file:
from explainerdashboard import ExplainerDashboard
db = ExplainerDashboard.from_config("dashboard.yaml")
app = db.app.server
Now we would like to rebuild the explainer.joblib
file whenever there is a
change to model.pkl
, data.csv
or explainer.yaml
by running
explainerdashboard build
. And we restart the gunicorn
server whenever
there is a change in explainer.joblib
or dashboard.yaml
by killing
the gunicorn server with kill -HUP pid
To do that we need to install
the python package watchdog
(pip install watchdog[watchmedo]
). This
package can keep track of filechanges and execute shell-scripts upon file changes.
So we can start the gunicorn server and the two watchdog filechange trackers
from a shell script start_server.sh
:
trap "kill 0" EXIT # ensures that all three process are killed upon exit
source venv/bin/activate # activate virtual environment first
gunicorn --pid gunicorn.pid gunicorn_dashboard:app &
watchmedo shell-command -p "./model.pkl;./data.csv;./explainer.yaml" -c "explainerdashboard build explainer.yaml" &
watchmedo shell-command -p "./explainer.joblib;./dashboard.yaml" -c 'kill -HUP $(cat gunicorn.pid)' &
wait # wait till user hits ctrl-c to exit and kill all three processes
Now we can simply run chmod +x start_server.sh
and ./start_server.sh
to
get our server up and running.
Whenever we now make a change to either one of the source files
(model.pkl
, data.csv
or explainer.yaml
), this produces a fresh
explainer.joblib
. And whenever there is a change to either explainer.joblib
or dashboard.yaml
gunicorns restarts and rebuild the dashboard.
So you can keep an explainerdashboard running without interuption and simply
an updated model.pkl
or a fresh dataset data.csv
into the directory and
the dashboard will automatically update.