官术网_书友最值得收藏!

  • Mastering Julia
  • Malcolm Sherrington
  • 261字
  • 2021-07-16 13:42:41

Data arrays and data frames

Users of R will be aware of the success of data frames when employed in analyzing datasets, a success which has been mirrored by Python with the pandas package. Julia too adds data frame support through use of a package DataFrames, which is available on GitHub, in the usual way.

The package extends Julia's base by introducing three basic types:

  • NA: An indicator that a data value is missing
  • DataArray: An extension to the Array type that can contain missing values
  • DataFrame: A data structure for representing tabular datasets

It is such a large topic that we will be looking at data frames in some depth when we consider statistical computing in Chapter 4, Interoperability.

However, to get a flavor of processing data with these packages:

julia> Pkg.add("DataFrames")
# if not already done so, adding DataFrames will add the DataArray and Blocks framework too.
julia> using DataFrames
julia> d0 = @data([1.,3.,2.,NA,6.])
5-element DataArray{Float64,1}:
 1.0
 3.0
 2.0
 NA
6.0

Common operations such as computing mean(d) or var(d) [variance] will produce NA because of the missing value in d[4]:

julia>isna(d0[4]) # => true

We can create a new data array by removing all the NA values and now statistical functions can be applied as normal:

julia> d1 = removeNA(d0) # => 4-element Array{Float64,1}
julia> (mean(d1), var(d1)) # => (3.0,4.66667)

Notice that if we try to convert a data array to a normal array, this will fail for d0 because of the NA values but will succeed for d1:

julia> convert(Array,d0) # =>MethodError(convert,(Array{T,N},[1.0,3.0,2.0,NA,6.0]))
julia> convert(Array,d1) # => 4-element Array{Float64,1}:
主站蜘蛛池模板: 西充县| 孟津县| 麦盖提县| 大名县| 吉木乃县| 天柱县| 黄石市| 山阴县| 娱乐| 甘洛县| 襄樊市| 锡林浩特市| 沈阳市| 徐汇区| 潼关县| 凤台县| 康马县| 滦平县| 桐乡市| 紫阳县| 屏东市| 辽源市| 当阳市| 桂东县| 秀山| 南靖县| 珠海市| 乃东县| 赤壁市| 威海市| 息烽县| 那曲县| 棋牌| 延吉市| 邵武市| 平陆县| 瓦房店市| 台南县| 宜川县| 万山特区| 陆良县|