textblob文本处理、词性分析与情感分析
1 前言
textBlob
是一個简单易用的 NLP库,基于 NLTK
和 pattern
库,
提供了文本处理和情感分析等功能。
安装
textblob==0.18.0
nltk==3.8.1
测试环境:Python3.10.9
使用前,先运行下面代码先下载些文件
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('brown')
nltk.download('wordnet')
2 词性标注
定义:标记文本中的每个词的词性(名词、动词、形容词)
用途:
- 分析句子的语法结构
- 进行更高级的文本分析,如句法分析
- 识别和提取特定词的信息
from textblob import TextBlob
text = "Natural Language Processing is fascinating. 测试"
blob = TextBlob(text)
print("词性标注:")
for word, pos in blob.tags:
print(f"{word} - {pos}")

3 情感分析
原创:有勇气的牛排
https://www.couragesteak.com/article/455
3.1 极性分析
定义
极性分析可以被看做一个任务,对于给定的一段带有观点的评论性文本,标记出它是整体正面或整体负面评价。
通俗讲,就是判断一段文字是正面,还是负面;赞同,还是反对。
极性分析有什么意义?
极性分析具有很大的商业价值与公共服务价值。
比如对互联网网站、论坛、微博、抖音、快手的舆情检测,产品评价检测等。
3.2 案例代码
from textblob import TextBlob
text = "Natural Language Processing is fascinating. "
blob = TextBlob(text)
sentiment = blob.sentiment
print(f"情感:极性 {sentiment.polarity}, 主观性 {sentiment.subjectivity}")

4 拼写检查和更正
from textblob import TextBlob
text = "this is a gaod ideo"
blob = TextBlob(text)
corrected_blob = blob.correct()
print(f"原文:{text}")
print(f"更正:{corrected_blob}")

5 词和句子
from textblob import TextBlob
text = "this is a good idea"
blob = TextBlob(text)
words = blob.words
print("词:", words)
plural = words.pluralize()
print("复数:", plural)

6 词义和词根
定义:获取词的词义、同义词、词根等
用途:
- 理解词的语义和语法形式
- 在信息检索和知识图谱中使用
- 改进文本处理和分析任务
from textblob import Word
word = Word("dragon")
synsets = word.synsets
print("词义:", synsets)
lemma = word.lemmatize()
print(f"词根:{lemma}")

<h1><a id="textblob_0"></a>textblob文本处理、词性分析与情感分析</h1>
<h2><a id="1__2"></a>1 前言</h2>
<p><code>textBlob</code> 是一個简单易用的 NLP库,基于 <code>NLTK</code> 和 <code>pattern</code>库,</p>
<p>提供了文本处理和情感分析等功能。</p>
<p>安装</p>
<pre><div class="hljs"><code class="lang-shell">textblob==0.18.0
nltk==3.8.1
</code></div></pre>
<p>测试环境:Python3.10.9</p>
<p>使用前,先运行下面代码先下载些文件</p>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">import</span> nltk
nltk.download(<span class="hljs-string">'averaged_perceptron_tagger'</span>)
nltk.download(<span class="hljs-string">'punkt'</span>)
nltk.download(<span class="hljs-string">'brown'</span>)
nltk.download(<span class="hljs-string">'wordnet'</span>)
</code></div></pre>
<h2><a id="2__29"></a>2 词性标注</h2>
<p>定义:标记文本中的每个词的词性(名词、动词、形容词)</p>
<p>用途:</p>
<ul>
<li>分析句子的语法结构</li>
<li>进行更高级的文本分析,如句法分析</li>
<li>识别和提取特定词的信息</li>
</ul>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob
text = <span class="hljs-string">"Natural Language Processing is fascinating. 测试"</span>
blob = TextBlob(text)
<span class="hljs-comment"># 词性标注</span>
<span class="hljs-built_in">print</span>(<span class="hljs-string">"词性标注:"</span>)
<span class="hljs-keyword">for</span> word, pos <span class="hljs-keyword">in</span> blob.tags:
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"<span class="hljs-subst">{word}</span> - <span class="hljs-subst">{pos}</span>"</span>)
</code></div></pre>
<p><img src="https://static.couragesteak.com/article/38495196de3ab74607428be9978338d6.png" alt="textblob词性分析" /></p>
<h2><a id="3__55"></a>3 情感分析</h2>
<p>原创:有勇气的牛排<br />
<a href="https://www.couragesteak.com/article/455" target="_blank">https://www.couragesteak.com/article/455</a></p>
<h3><a id="31__59"></a>3.1 极性分析</h3>
<p><strong>定义</strong></p>
<p>极性分析可以被看做一个任务,对于给定的一段带有观点的评论性文本,标记出它是整体正面或整体负面评价。</p>
<p>通俗讲,就是判断一段文字是正面,还是负面;赞同,还是反对。</p>
<p><strong>极性分析有什么意义?</strong></p>
<p>极性分析具有很大的商业价值与公共服务价值。</p>
<p>比如对互联网网站、论坛、微博、抖音、快手的舆情检测,产品评价检测等。</p>
<h3><a id="32__75"></a>3.2 案例代码</h3>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob
<span class="hljs-comment"># 自然语言处理很有趣</span>
text = <span class="hljs-string">"Natural Language Processing is fascinating. "</span>
blob = TextBlob(text)
<span class="hljs-comment"># 情感分析</span>
sentiment = blob.sentiment
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"情感:极性 <span class="hljs-subst">{sentiment.polarity}</span>, 主观性 <span class="hljs-subst">{sentiment.subjectivity}</span>"</span>)
</code></div></pre>
<p><img src="https://static.couragesteak.com/article/f3443cb77f0455aed14ce3bb4be82513.png" alt="textblob情感分析" /></p>
<h2><a id="4__93"></a>4 拼写检查和更正</h2>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob
text = <span class="hljs-string">"this is a gaod ideo"</span>
blob = TextBlob(text)
<span class="hljs-comment"># 拼写检查</span>
corrected_blob = blob.correct()
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"原文:<span class="hljs-subst">{text}</span>"</span>)
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"更正:<span class="hljs-subst">{corrected_blob}</span>"</span>)
</code></div></pre>
<p><img src="https://static.couragesteak.com/article/9277bf8da771cc9099f3c25cd4b20693.png" alt="textblob拼写检查纠错" /></p>
<h2><a id="5__109"></a>5 词和句子</h2>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob
text = <span class="hljs-string">"this is a good idea"</span>
blob = TextBlob(text)
<span class="hljs-comment"># 词操作</span>
words = blob.words
<span class="hljs-built_in">print</span>(<span class="hljs-string">"词:"</span>, words)
<span class="hljs-comment"># 复数化和单数化</span>
plural = words.pluralize()
<span class="hljs-built_in">print</span>(<span class="hljs-string">"复数:"</span>, plural)
</code></div></pre>
<p><img src="https://static.couragesteak.com/article/66f049220f5bfd0d3cf8bcf434f759f8.png" alt="textblob词和句子" /></p>
<h2><a id="6__128"></a>6 词义和词根</h2>
<p>定义:获取词的词义、同义词、词根等</p>
<p>用途:</p>
<ul>
<li>理解词的语义和语法形式</li>
<li>在信息检索和知识图谱中使用</li>
<li>改进文本处理和分析任务</li>
</ul>
<pre><div class="hljs"><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> Word
word = Word(<span class="hljs-string">"dragon"</span>)
<span class="hljs-comment"># 词义</span>
synsets = word.synsets
<span class="hljs-built_in">print</span>(<span class="hljs-string">"词义:"</span>, synsets)
<span class="hljs-comment"># 词根</span>
lemma = word.lemmatize()
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"词根:<span class="hljs-subst">{lemma}</span>"</span>)
</code></div></pre>
<p><img src="https://static.couragesteak.com/article/a0dfa0159d80c68f18cd326335978a62.png" alt="textblob词义和词根" /></p>
留言