官术网_书友最值得收藏!

Get and cleanup the data

You can get a CSV file of the data from https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ASE_2014_00CSA01&prodType=table. Just hit the Download button and click OK. The result is a CSV file that has lots of interesting information in it. If you open it, though, it doesn't really look like an easy-to-use data file.

A single data row looks like this:

00100000US,,United States,,00,Total for all sectors,,001,All firms,001,All firms,00,All firms,003,Equally veteran-/nonveteran-owned,319,Firms with 4 to 5 years in business,2014,12174,11571648,107722,2746052,6.3,15.3,17.8,16.4

So, we'll sanitize the data a bit before we start processing it with D3. There are many different ways you can do this. You can open the file in Excel and select the files you want, you can use some command-line filtering utilities to get the required data, or even write a simple Python or R script to return the data you want. Since we're already working with JavaScript and we've installed Node.js in Chapter 1, Getting Started with D3, let's write a simple script that filters our data. We'll not filter too much, let's just get rid of the data we're not interested in:

  • We're not interested in the data for a specific industry sector, so we start by filtering out all the rows that don't have the value Total for all sectors set to Y.
  • Next, we'll filter out the columns that aren't interesting for us. What we want are the columns that indicate gender, ethnic group, race, veteran status, time in business, and finally, the rows that contain the number of businesses.

We use the following simple Node.js script for that:

var d3 = require('d3'); 
var fs = require('fs');

// read the data
fs.readFile('./ASE_2014_00CSA02.csv', function (err, fileData) {
var rows = d3.csvParse(fileData.toString());

// filter out the sector specific stuff
var allSectors = rows.filter(function (row) {
return row['NAICS.id'] === '00'
});

// remove unused columns, and make nice headers
var mapped = allSectors.map( function(el) {
return {
sex: el['SEX.id'],
sexLabel: el['SEX.display-label'],
ethnicGroup: el['ETH_GROUP.id'],
ethnicGroupLabel: el['ETH_GROUP.display-label'],
raceGroup: el['RACE_GROUP.id'],
raceGroupLabel: el['RACE_GROUP.display-label'],
vetGroup: el['VET_GROUP.id'],
vetGroupLabel: el['VET_GROUP.display-label'],
yearsInBusiness: el['YIBSZFI.id'],
yearsInBusinessLabel: el['YIBSZFI.display-label'],
count: el['FIRMPDEMP']
}
});

fs.writeFile('./businessFiltered.csv',d3.csvFormat(mapped));
});

What happens in this script is that we use the fs.readFile API of Node.js to read the file we downloaded from the filesystem, and then use D3 to parse the CSV file. After parsing, we filter out the elements we don't want, and use map to convert each element to a simple one. Finally, we use the fs.writeFile API call to output the converted data as a CSV again using the d3.csvFormat function. To run this script yourself, navigate to the <DVD3>/src/chapter-02/data/ directory and run the ./cleanBusinesses.js node. The result of this is that now we have a very clean and easy-to-understand CSV to process in our visualization:

sex,sexLabel,ethnicGroup,ethnicGroupLabel,raceGroup,raceGroupLabel, ... 
001,All firms,001,All firms,00,All firms, ...
001,All firms,001,All firms,00,All firms, ...

With this data, we can now very easily select specific groups to visualize by just filtering on the sex, ethnicGroup, raceGroup, and vetGroup properties.

主站蜘蛛池模板: 富蕴县| 屯门区| 延川县| 霸州市| 天峻县| 蕉岭县| 卓资县| 凉山| 宁远县| 洱源县| 彭泽县| 张掖市| 西华县| 修水县| 广南县| 饶阳县| 抚州市| 舒兰市| 郓城县| 靖西县| 孟连| 缙云县| 花垣县| 西畴县| 景洪市| 连平县| 女性| 长治县| 深州市| 汉阴县| 柳江县| 津市市| 永登县| 南澳县| 南城县| 秦皇岛市| 衡水市| 壤塘县| 高唐县| 长春市| 台北县|