官术网_书友最值得收藏!

Learning to log for robust error checking

Notebooks are useful to keep track of what you did and what went wrong. Logging works in a similar fashion, and we can log errors and other useful information with the standard Python logging library.

For reproducible data analysis, it is good to know the modules our Python scripts import. In this recipe, I will introduce a minimal API from dautil that logs package versions of imported modules in a best effort manner.

Getting ready

In this recipe, we import NumPy and pandas, so you may need to import them. See the Configuring pandas recipe for pandas installation instructions. Installation instructions for NumPy can be found at http://docs.scipy.org/doc/numpy/user/install.html (retrieved July 2015). Alternatively, install NumPy with pip using the following command:

$ [sudo] pip install numpy

The command for Anaconda users is as follows:

$ conda install numpy

I have installed NumPy 1.9.2 via Anaconda. We also require AppDirs to find the appropriate directory to store logs. Install it with the following command:

$ [sudo] pip install appdirs

I have AppDirs 1.4.0 on my system.

How to do it...

To log, we need to create and set up loggers. We can either set up the loggers with code or use a configuration file. Configuring loggers with code is the more flexible option, but configuration files tend to be more readable. I use the log.conf configuration file from dautil:

[loggers]
keys=root

[handlers]
keys=consoleHandler,fileHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler

[handler_consoleHandler]
class=StreamHandler
level=INFO
formatter=simpleFormatter
args=(sys.stdout,)

[handler_fileHandler]
class=dautil.log_api.VersionsLogFileHandler
formatter=simpleFormatter
args=('versions.log',)

[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=%d-%b-%Y

The file configures a logger to log to a file with the DEBUG level and to the screen with the INFO level. So, the logger logs more to the file than to the screen. The file also specifies the format of the log messages. I created a tiny API in dautil, which creates a logger with its get_logger() function and uses it to log the package versions of a client program with its log() function. The code is in the log_api.py file of dautil:

from pkg_resources import get_distribution
from pkg_resources import resource_filename
import logging
import logging.config
import pprint
from appdirs import AppDirs
import os


def get_logger(name):
    log_config = resource_filename(__name__, 'log.conf')
    logging.config.fileConfig(log_config)
    logger = logging.getLogger(name)

    return logger


def shorten(module_name):
    dot_i = module_name.find('.')

    return module_name[:dot_i]


def log(modules, name):
    skiplist = ['pkg_resources', 'distutils']

    logger = get_logger(name)
    logger.debug('Inside the log function')

    for k in modules.keys():
        str_k = str(k)

        if '.version' in str_k:
            short = shorten(str_k)

            if short in skiplist:
                continue

            try:
                logger.info('%s=%s' % (short,    
                            get_distribution(short).version))
            except ImportError:
                logger.warn('Could not impport', short)


class VersionsLogFileHandler(logging.FileHandler):
    def __init__(self, fName):
        dirs = AppDirs("PythonDataAnalysisCookbook", 
                       "Ivan Idris")
        path = dirs.user_log_dir
        print(path)

        if not os.path.exists(path):
            os.mkdir(path)

        super(VersionsLogFileHandler, self).__init__(
              os.path.join(path, fName))

The program that uses the API is in the log_demo.py file in this book's code bundle:

import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from dautil import log_api

log_api.log(sys.modules, sys.argv[0])

How it works...

We configured a handler (VersionsLogFileHandler) that writes to file and a handler (StreamHandler) that displays messages on the screen. StreamHandler is a class in the Python standard library. To configure the format of the log messages, we used the SimpleFormater class from the Python standard library.

The API I made goes through modules listed in the sys.modules variable and tries to get the versions of the modules. Some of the modules are not relevant for data analysis, so we skip them. The log() function of the API logs a DEBUG level message with the debug() method. The info() method logs the package version at INFO level.

See also

主站蜘蛛池模板: 广元市| 连南| 庆城县| 大姚县| 当涂县| 琼中| 黄龙县| 都江堰市| 石林| 玉树县| 富宁县| 龙岩市| 周口市| 南充市| 伊吾县| 永宁县| 迭部县| 茂名市| 随州市| 教育| 武穴市| 潮安县| 社会| 清镇市| 满城县| 久治县| 珲春市| 怀柔区| 西盟| 故城县| 叙永县| 青龙| 亳州市| 新建县| 哈巴河县| 九江县| 绥阳县| 聂拉木县| 萝北县| 江油市| 龙州县|