案例,spss,数据分析

WEB页面中主题文本信息的自动提取研究


全文字数:14000字左右  原创时间:<=2022年

【内容摘要】

WEB页面中主题文本信息的自动提取研究


WEB页面中主题文本信息的自动提取研究
因特网发展到现在已经成为全球的信息中心,成为人们获取信息的主要来源,但目前互联网中使用的HTML标准绝大部分标记都是针对浏览器的表述,而浏览器只负责对网页页面结构和表现形式进行布局,不能对网页所表现的内容作分析处理,且HTML所具有的开放性与异构性使计算机快速准确的获取网页主题信息变的极为困难。本文通过分析目前互联网中的网页结构表现形式,对其布局结构分析内容,设计出了一种提取出网页中主题信息的新方法,使计算机能够通过该算法提取出网页主题内容,以方便计算机作数据化处理。


关键字  HTML网页    主题信息   提取算法    JAVA

Title    Study of Approach and Algorithm to AutomaticExtract                    

           Topic Information from Semi-Structured Text(Webpage)                                            

Abstract
The Internet has now become a global information center, and it is also the main source that the public get information from. But at present, the HTML standard markings which are used in the internet are mostly directed at the browser presentation, and the browser is only responsible for the structure and layout of web pages. However, it can not process or analyze the content of the web pages. And at the same time, with the HTML by the openness and the heterogeneity, it becomes extremely difficult for computer with rapid access to accurate information of the website theme. Through analyzing the current structure and content of the Internet web pages, this text designs a new method that extracts theme information from web pages so that the computer can extract the website theme by this algorithm, which will helps facilitate computer for data processing.


Keywords  HTML Webpage    Topic Information
          Information Extraction    JAVA

 

*若需了解更多与协助请咨询↓→[电脑QQ][手机QQ]【数据协助】