- Natural Language Processing with Java and LingPipe Cookbook
- Breck Baldwin Krishna Dayanidhi
- 344字
- 2021-08-05 17:12:49
Applying a classifier to a .csv file
Now, we can test our language ID classifier on the data we downloaded from Twitter. This recipe will show you how to run the classifier on the .csv
file and will set the stage for the evaluation step in the next recipe.
How to do it...
Applying a classifier to the .csv
file is straightforward! Just perform the following steps:
- Get a command prompt and run:
java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar:lib/twitter4j-core-4.0.1.jar:lib/opencsv-2.4.jar com.lingpipe.cookbook.chapter1.ReadClassifierRunOnCsv
- This will use the default CSV file from the
data/disney.csv
distribution, run over each line of the CSV file, and apply a language ID classifier frommodels/ 3LangId.LMClassifier
to it:InputText: When all else fails #Disney Best Classified Language: english InputText: ES INSUPERABLE DISNEY !! QUIERO VOLVER:( Best Classified Language: Spanish
- You can also specify the input as the first argument and the classifier as the second one.
How it works…
We will deserialize a classifier from the externalized model that was described in the previous recipes. Then, we will iterate through each line of the .csv
file and call the classify method of the classifier. The code in main()
is:
String inputPath = args.length > 0 ? args[0] : "data/disney.csv"; String classifierPath = args.length > 1 ? args[1] : "models/3LangId.LMClassifier"; @SuppressWarnings("unchecked") BaseClassifier<CharSequence> classifier = (BaseClassifier<CharSequence>) AbstractExternalizable.readObject(new File(classifierPath)); List<String[]> lines = Util.readCsvRemoveHeader(new File(inputPath)); for(String [] line: lines) { String text = line[Util.TEXT_OFFSET]; Classification classified = classifier.classify(text); System.out.println("InputText: " + text); System.out.println("Best Classified Language: " + classified.bestCategory()); }
The preceding code builds on the previous recipes with nothing particularly new. Util.readCsvRemoveHeader
, shown as follows, just skips the first line of the .csv
file before reading from disk and returning the rows that have non-null values and non-empty strings in the TEXT_OFFSET
position:
public static List<String[]> readCsvRemoveHeader(File file) throws IOException { FileInputStream fileIn = new FileInputStream(file); InputStreamReader inputStreamReader = new InputStreamReader(fileIn,Strings.UTF8); CSVReader csvReader = new CSVReader(inputStreamReader); csvReader.readNext(); //skip headers List<String[]> rows = new ArrayList<String[]>(); String[] row; while ((row = csvReader.readNext()) != null) { if (row[TEXT_OFFSET] == null || row[TEXT_OFFSET].equals("")) { continue; } rows.add(row); } csvReader.close(); return rows; }
- Azure IoT Development Cookbook
- C#完全自學(xué)教程
- Access 數(shù)據(jù)庫應(yīng)用教程
- 神經(jīng)網(wǎng)絡(luò)編程實(shí)戰(zhàn):Java語言實(shí)現(xiàn)(原書第2版)
- Python Game Programming By Example
- Python編程完全入門教程
- Full-Stack Vue.js 2 and Laravel 5
- C語言程序設(shè)計(jì)案例式教程
- 快速念咒:MySQL入門指南與進(jìn)階實(shí)戰(zhàn)
- HDInsight Essentials(Second Edition)
- Yii Project Blueprints
- 新印象:解構(gòu)UI界面設(shè)計(jì)
- Practical Predictive Analytics
- 3D Printing Designs:Octopus Pencil Holder
- 區(qū)塊鏈:技術(shù)與場景