微博分析研究综述
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(71271076);河北省统计科学研究计划项目(2013H210);河北科技大学五大平台开放基金(WH03)


Research overview of microblog analysis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    微博,是当前重要的社会信息传播平台之一,具有易操作、传播快等特点,人们可以通过微博直接快速地表达对突发事件、公众人物、热门产品等的观点。为了利用海量微博信息,需要综合多种分析方法挖掘其潜在价值。综述了当前微博分析领域的研究现状,提出了自主研发的微博分析系统,探讨了未来微博分析的研究方向。首先,介绍了微博分析的主要技术方法,包括利用微博开放平台和利用网络爬虫技术。利用微博提供的开放接口,可以方便快捷地获取微博信息,如微博内容、用户评论、用户个人详情、粉丝数、关注数等。但也存在诸多限制,如每小时只能抓取有限次数、微博平台并不开放所有信息资源等。利用网络爬虫技术可以获取更多信息,如基于全网的网络爬虫的信息采集技术可以覆盖更广的范围,基于主题的网络爬虫的信息采集技术可以选择性爬取预先设定的主题等。其次,介绍了目前微博分析的热点问题,包括微博用户行为和微博内容两方面。微博用户行为分析包括:1)传播网络研究,利用Gephi等可视化工具,呈现出微博在传播过程中的传播路径、传播范围、关键转发节点等信息,可用于预测未来传播情况;2)传播因素研究,通过分析用户行为,揭示信息传播的可能原因;3)用户影响力分析,不同学者给出不同的度量方法,而要精准地评价用户影响力需要综合考虑多方面因素,如粉丝数、转发数、被提及数、回复、社会关系等。关于微博内容的分析包括:1)微博文本预处理,包括分词和去停用词2个步骤;2)微博热点话题发现,常用方法包括基于词频的统计方法和文本聚类方法,这两种方法都有利于提高发现热点话题的效果,但没有考虑到话题动态演变的特性;3)情感分析,也被称为观点挖掘,一直是微博研究领域的热点问题,可以利用微博表情图片抽取情感词,并结合构建语义词典和机器学习的方法对微博进行情感分类,最终判断微博情感极性,可用于舆情监控、商业预测和产品选择等方面。再次,提出了自主研发的微博分析系统——阅微,重点介绍了其情感分析、地域分布和传播图3个模块。情感分析模块,基于情感词典的方法对用户的评论内容进行情感分类;地域分布模块,提取参与用户的地理位置信息并加以统计分析,呈现出微博传播在全国范围内的分布情况;传播图模块,利用可视化手段展现微博信息的传播扩散情况,如转发关系、转发层级、转发范围等情况。最后,归纳全文,从技术和应用2个方面归纳微博分析的挑战问题:可从技术上突破微博接口资源限制,提高微博分析的效率和精准度;同时从微博应用方面发展事件监控、管理和商业方面的应用。

    Abstract:

    Microblog is one of the important social information communication platform. Because of its characteristics of easy operation and fast spread, people can directly and quickly express their attitude to emergencies, public figures, hot products and daily life through microblog. In order to utilize the vast microblog information, it needs to combine various microblog analysis methods to discover the potential value of information. This paper first reviews the current research of microblog analysis field and puts forward the independent research and development system of microblog analysis, then explores the future research direction of microblog analysis.First of all, this paper introduces two dominating technical methods of microblog analysis, including microblog open platform and web crawler technology. People can have convenient and prompt access to microblog information resources via open interface of microblog, such as microblog content, user reviews, user personal details, the number of fans, the number of attention, etc.. But it also has many limitations. For example, it can only be grabbed for limited times every hour and it is not open for all the resources information. Utilizing the web crawler technology, more information resources can be obtained. For example, the information acquisition technology based on general web crawler can cover a wider range and based on the focused crawler, the predefined theme can be crawled selectively.Secondly, this paper presents the hot issues of current microblog, including analysis on users' behavior and microblog content. Research on microblog user behavior includes: 1) Research on information communication network. Its purpose is to predict future propagation, which uses Gephi visualization tools to present propagation path under microblog diffusion, propagation range and the key of forwarding nodes, etc.. 2) Research on propagation factors. It reveals the possible causes of information dissemination by analyzing users' behavior. 3) Research on user's influence. Different scholars give different methods of measurement, while accurate evaluation of user's influence should take into account many factors, such as the number of fans, the forwarding number, mentioned numbers, reply, and social relationship, etc.. Analysis of the microblog contents includes: 1) Microblog texts pretreatment, including words segmentation and stop words. 2) Microblog hot topic detection. Its methods include statistical method of word frequency and text clustering methods, which are beneficial to the improved detection effect of hot topics without considering the characteristics of dynamic topic evolution. 3) Sentiment analysis, also known as opinion mining, has always been a hot issue in the research field of microblog. It can use microblog expression image to extract emotional words and combine with the method of constructing the semantic dictionary and machine learning to classify the sentiments of microblog, finally judge sentiment polarity of microblog. It can be used for public opinion monitoring, business forecasting and product selection, etc..Thirdly, we put forward We-Reading, an independent research and development system of microblog analysis, and introduce three main modules including sentiment analysis, geographical distribution and propagation graph expansion. Sentiment analysis module classifies users' comments sentiment based on the method of sentiment lexicon. Regional distribution module extracts the geographical position information of users and does statistical analysis in order to show the spread distribution of microblog nationwide. Propagation graph module uses visual means to represent the microblog broadcast of information diffusion, such as forwarding relations, forwarding layer, and forwarding scope, etc..Finally, this paper summarizes the challenging problems that microblog faces from two aspects of technology and applications. Technical analysis of microblog can break interface resource constraints, and improve the efficiency and precision of microblog analysis system. Practical analysis of microblog expands event monitoring, management application and commercial application.

    参考文献
    相似文献
    引证文献
引用本文

刘 滨,张静远,刘 强,赵静阳,李 寒,徐巍巍.微博分析研究综述[J].河北科技大学学报,2015,36(1):100-110

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2014-10-26
  • 最后修改日期:2014-11-25
  • 录用日期:
  • 在线发布日期: 2015-01-22
  • 出版日期: