官术网_书友最值得收藏!

NLP workflow template

Some of us would love to work on Natural Language Processing for its sheer intellectual challenges across research and engineering. To measure our progress, having a workflow with rough time estimates is really valuable. In this short section, we will briefly outline what a usual NLP or even most applied machine learning processes look like.

Most people I've learned from like to use a (roughly) five-step process:

  • Understanding the problem
  • Understanding and preparing data
  • Quick wins: proof of concepts
  • Iterating and improving the results
  • Evaluation and deployment

This is just a process template. It has a lot of room for customization regarding the engineering culture in your company. Any of these steps can be broken down further. For instance, data preparation and understanding can be split further into analysis and cleaning. Similarly, the proof of concept step may involve multiple experiments, and a demo or a report submission of best results from those.

Although this appears to be a strictly linear process, it is not so. More often than not, you will want to revisit a previous step and change a parameter or a particular data transform to see the effect on later performance.

In order to do so, it is important to factor in the cyclic nature of this process in your code. Write code with well-designed abstractions with each component being independently reusable.

If you are interested in how to write better NLP code, especially for research or experimentation, consider looking up the slide deck titled Writing Code for NLP Research, by Joel Grus of AllenAI.

Let's expand a little bit into each of these sections.

主站蜘蛛池模板: 从江县| 伊金霍洛旗| 奉新县| 淳化县| 东光县| 襄垣县| 临泉县| 达日县| 千阳县| 嘉禾县| 清水河县| 石阡县| 邮箱| 万源市| 获嘉县| 岳池县| 武宣县| 三穗县| 梁平县| 婺源县| 治多县| 霍林郭勒市| 密山市| 阿瓦提县| 海宁市| 青铜峡市| 稷山县| 乌拉特中旗| 寿宁县| 泰宁县| 永修县| 浏阳市| 柳河县| 惠东县| 宁武县| 浑源县| 明水县| 安溪县| 蛟河市| 沂水县| 新建县|