Apache OpenNLP的初探
2021-07-21 06:55
标签:source this dso baidu 使用 efault find none 关于 https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control 环境:IDEA+jdk8+maven 3.5.2 然后就可以使用nlp的开发工具了。我们来看一些实例: 这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:\NLP_Practics\models\目录下。 是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。 运行结果: 完整测试代码: Apache OpenNLP的初探 标签:source this dso baidu 使用 efault find none 关于 原文地址:https://www.cnblogs.com/yuyu666/p/15029427.html
新建maven项目,添加nlp的maven依赖:
opennlp-tools //divide sentences
public static void SentenceDetect() throws IOException {
String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin");
SentenceModel model = new SentenceModel(is);
SentenceDetectorME sdetector = new SentenceDetectorME(model);
String sentences[] = sdetector.sentDetect(paragraph);
for (String single : sentences) {
System.out.println(single);
}
is.close();
}
关于更多模型的下载可以在地址:
http://maven.tamingtext.com/opennlp-models/models-1.5/
中找到。
我们来看下对应的输出结果:Hi. How are you?
This is JD_Dog.
He is my good friends.He is very kind.but he is no more handsome than me.
我们再看一个英文分词的例子://devide words
public static void Tokenize() throws IOException {
InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin");
TokenizerModel model = new TokenizerModel(is);
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
for (String a : tokens)
System.out.println(a);
is.close();
}
Hi
.
How
are
you
?
This
is
Richard
.
Richard
is
still
single
.
please
help
him
find
his
girl
package package01;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
public class Test01 {
//divide sentences
public static void SentenceDetect() throws IOException {
String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin");
SentenceModel model = new SentenceModel(is);
SentenceDetectorME sdetector = new SentenceDetectorME(model);
String sentences[] = sdetector.sentDetect(paragraph);
for (String single : sentences) {
System.out.println(single);
}
is.close();
}
//devide words
public static void Tokenize() throws IOException {
InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin");
TokenizerModel model = new TokenizerModel(is);
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
for (String a : tokens)
System.out.println(a);
is.close();
}
public static void main(String[] args) throws IOException {
// Test01.SentenceDetect();
Test01.Tokenize();
}
}
上一篇:网页显示运行时间js特效
下一篇:c#编写的番茄钟倒计时器代码