官术网_书友最值得收藏!

  • Machine Learning With Go
  • Daniel Whitenack
  • 181字
  • 2021-07-08 10:37:29

Pachyderm jargon

Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:

  • Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
  • Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
  • Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
  • Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major differences. For example, merging data doesn't exactly make sense. If there are merge conflicts on petabytes of data, no human could resolve these. Furthermore, the Git protocol would not be space efficient in general for large sets of data. Pachyderm uses its own internal logic to perform the versioning and work with versioned data, and the logic is both space efficient and processing efficient in terms of caching.
主站蜘蛛池模板: 乐至县| 襄垣县| 滦南县| 宝坻区| 孝感市| 荔浦县| 榆中县| 年辖:市辖区| 宕昌县| 巫山县| 云林县| 五大连池市| 巨鹿县| 黄骅市| 洛隆县| 兖州市| 桃源县| 广宁县| 石首市| 庆元县| 临漳县| 修武县| 磐安县| 金昌市| 抚远县| 绥德县| 南汇区| 墨玉县| 辽源市| 黄石市| 宁强县| 景东| 塘沽区| 互助| 比如县| 松滋市| 弥勒县| 迁西县| 镇雄县| 汉源县| 曲松县|