Research on Cloud Computing Processing and Optimization of Distributed Computer 分布式计算机的云计算处理与优化研究

前言

经过在 HKU 一年区块链相关的学习,对分布式存储这一领域产生了兴趣,毕业项目也和 IPFS 相关,详见「 Uright - 区块链音乐版权管理ÐApp 」,回内地后恰有机会和 CNFS Protocol Lab 的孙野院长合作撰写了本篇「Research on Cloud Computing Processing and Optimization of Distributed Computer(基于 CNFS 区块链的网络存储与优化)」,对分布式网络存储、计算有了更深的理解,特此记录。

本文被 ICCEA(2021 International Conference on Electronic, Electrical and Computer) 所收录。
本文被 ICCEA(2021 International Conference on Electronic, Electrical and Computer) 所收录。

Abstract 抽象

With the rapid development of network traffic, video, pictures, information will produce a lot of data, which causes the problem of computer calculation and storage. With the increasing demand for computer processing capacity, the traditional computer computing method has been unable to meet the needs of society, which is also gradually developing in the direction of Cloud Computing (hereinafter referred to as CDC) and distributed computing. Through distributed computing, the computer can decompose a large task into many small tasks, which can be distributed to different computing resources. Therefore, distributed computing has become the main way of CDC processing, which can meet the existing market. At the same time, CNFS is the abbreviation of computer network file system, which is a global, point-to-point distributed version file system. Through CNFS, we can connect all the computing devices with the same file system together, which can be called the information processing system. Firstly, this paper analyzes the related concepts. Then, this paper analyzes the architecture of CDC. Finally, some suggestions are put forward.
随着网络流量的飞速发展,视频、图片、信息会产生大量的数据,这就造成了计算机计算和存储的问题。随着对计算机处理能力的需求不断增加,传统的计算机计算方法已经无法满足社会的需求,也逐渐向云计算(以下简称 CDC)和分布式计算的方向发展。通过分布式计算,计算机可以将一个大任务分解成许多小任务,这些小任务可以分配给不同的计算资源。因此,分布式计算成为 CDC 处理的主要方式,可以满足现有市场。同时,CNFS 是 Computer Network File System 的缩写,是一种全局性的、点对点的分布式版本文件系统。通过 CNFS,我们可以将所有具有相同文件系统的计算设备连接在一起,可以称为信息处理系统。首先,本文对相关概念进行了分析。然后,本文分析了 CDC 的架构。最后,提出了一些建议。

1. Introduction 1. 引言

With the development of IT, computer information has become an indispensable part of people’s life, which requires us to continuously improve the information computing ability [1]. Therefore, distributed computing has become the main way, which can carry out more efficient computing and processing [2]. CNFS (Computer Network File System) is a point-to-point distributed file system, which aims to replace the traditional HTTP system [3]. Therefore, CNFS has learned many lessons from the past successful systems, which has become the cornerstone of CDC and cloud storage. At the same time, CNFS will become the cornerstone of blockchain. The key technology of CDC is decentralization. However, CNFS is a perfect solution, which can play a significant role [4-6]. The decentralized technology of CNFS has been applied to many fields, which can solve many problems of the existing platform [7].
随着 IT 的发展,计算机信息已成为人们生活中不可或缺的一部分,这就要求我们不断提高信息计算能力 [1]。因此,分布式计算成为主要方式,可以进行更高效的计算和处理 [2]。CNFS(Computer Network File System)是一种点对点的分布式文件系统,旨在取代传统的 HTTP 系统 [3]。因此,CNFS 从过去的成功系统中吸取了许多教训,这已成为 CDC 和云存储的基石。同时,CNFS 将成为区块链的基石。CDC 的关键技术是去中心化。然而,CNFS 是一个完美的解决方案,可以发挥重要作用 [4-6]。CNFS 的去中心化技术已经应用于许多领域,可以解决现有平台的许多问题 [7]。

2.1 Distributed computing 2.1 分布式计算

Distributed computing is to divide a large task into many small tasks, which can be distributed to different computing resources. A distributed system is a collection of independent computers. Therefore, the distributed computing system is just like a computer, which can effectively solve the balance between cost, efficiency and scalability. Since the 1980s, the distributed computer has become the focus of research, including a variety of systems, such as middleware, SOA, grid computing, web service, Hadoop platform and so on [8]. Before the emergence of CDC, grid computing is the most typical representative of distributed computing. By connecting the hardware, software and information resources scattered all over the Internet into a huge whole, grid computing can enable people to use the geographically dispersed resources, which will complete a variety of large-scale, complex computing and data processing tasks. Grid computing is an Internet level distributed computing method, which mainly uses the distributed computing resources on the Internet. Grid computing is the closest to CDC, which can achieve centralized parallel processing of large computing tasks. However, the development of grid computing technology contributes a lot, which has become the technical basis of CDC development [9]. Distributed computing is one of the most important supporting technologies of CDC. Taking Google CDC as an example, distributed computing cases mainly include distributed data storage system GFS, distributed data management system Big Table, open source Hadoop platform, etc. In the field of PAAS and SAAS of CDC, distributed computing will be an important technology. With the method of distributed computing, we can release the binding relationship between users and large application systems. Overall, distributed computing breeds CDC. In the CDC environment, distributed computing reshapes the application form and service form of CDC, which provides a simple and feasible computing method for big data applications [10].
分布式计算是将一个大任务划分成许多小任务,这些小任务可以分布到不同的计算资源中。分布式系统是独立计算机的集合。因此,分布式计算系统就像一台计算机,可以有效地解决成本、效率和可扩展性之间的平衡。自 1980 年代以来,分布式计算机已成为研究的重点,包括各种系统,如中间件、SOA、网格计算、Web 服务、Hadoop 平台等 [8]。在 CDC 出现之前,网格计算是分布式计算最典型的代表。通过将散布在互联网上的硬件、软件和信息资源连接成一个巨大的整体,网格计算可以使人们利用地理上分散的资源,从而完成各种大规模、复杂的计算和数据处理任务。网格计算是一种 Internet 级别的分布式计算方法,它主要利用 Internet 上的分布式计算资源。网格计算最接近 CDC,可以实现大型计算任务的集中并行处理。但是,网格计算技术的发展贡献很大,这已成为 CDC 发展的技术基础 [9]。分布式计算是 CDC 最重要的支撑技术之一。以 Google CDC 为例,分布式计算案例主要包括分布式数据存储系统 GFS、分布式数据管理系统 Big Table、开源 Hadoop 平台等。在 CDC 的 PAAS 和 SAAS 领域,分布式计算将是一项重要的技术。 通过分布式计算的方法,我们可以释放用户与大型应用系统之间的绑定关系。总体而言,分布式计算孕育了 CDC。在 CDC 环境中,分布式计算重塑了 CDC 的应用形态和服务形态,为大数据应用提供了一种简单可行的计算方法 [10]。

2.2 Advantages of CNFS 2.2 CNFS 的优势

CNFS provides a new distributed Internet infrastructure. On the infrastructure, we can build many different types of applications. Therefore, CNFS is a global, mountable and versioned file system, which has many advantages [11]. First, decentralization is faster. All the data in CNFS are stored on the user’s own computer, which is equivalent to distributing the central server of HTTP to each user. If other users want to get the data, they can extract it from the nearest user’s computer. Second, reduce the dependence on the backbone. The transmission means of CNFS is obviously different from that of HTTP. HTTP mainly depends on the backbone network [12-14]. CNFS is mainly transmitted through nodes, which can be transmitted from one node to another. Therefore, CNFS can switch another node immediately even if one node fails. Third, permanent data storage. The storage mode of CNFS is very special, which is a fragmented storage mode. CNFS data can be divided into many parts, which leads to people can not get complete data. Therefore, data can be saved safely and permanently [15].
CNFS 提供了新的分布式 Internet 基础设施。在基础设施上,我们可以构建许多不同类型的应用程序。因此,CNFS 是一个全局的、可挂载的、版本化的文件系统,具有许多优点 [11]。首先,去中心化更快。CNFS 中的所有数据都存储在用户自己的计算机上,相当于将 HTTP 的中心服务器分发给每个用户。如果其他用户想要获取数据,他们可以从最近的用户的计算机中提取数据。第二,减少对主干的依赖。CNFS 的传输方式与 HTTP 的传输方式明显不同。HTTP 主要依赖于骨干网络 [12-14]。CNFS 主要通过节点传输,节点可以从一个节点传输到另一个节点。因此,即使一个节点发生故障,CNFS 也可以立即切换另一个节点。第三,永久数据存储。CNFS 的存储模式非常特殊,是一种碎片化的存储模式。CNFS 数据可以分为许多部分,这导致人们无法获得完整的数据。因此,数据可以安全、永久地保存 [15]。

3. CDC processing 3. CDC 处理

3.1 K-nearest neighbor method 3.1 K 最近邻法

K-nearest neighbor (KNN) is a typical ranking classification algorithm. After one judgment, the sorting algorithm can output documents belonging to multiple categories. Through KNN, we can calculate the similarity of each text in the training sample set, which can find k most similar training texts. At the same time, we can select a threshold, which can be sorted according to the score. The similarity between k nearest neighbor training samples and the test sample is shown in Formula 1. K neighbors calculate the weight of each class, as shown in formula 2.
K 最近邻 (KNN) 是一种典型的排名分类算法。经过一次判断后,排序算法可以输出属于多个类别的文档。通过 KNN,我们可以计算出训练样本集中每个文本的相似度,从而可以找到 k 个最相似的训练文本。同时,我们可以选择一个阈值,可以根据分数进行排序。公式 1 显示了 k 个最近邻训练样本和测试样本之间的相似性。K 个相邻函数计算每个类的权重,如公式 2 所示。

cnfs_knn_formula

3.2 CDC architecture

CDC can provide elastic resources on demand, which is a collection of services. The architecture of CDC can be divided into three levels: core service, service management and user access interface, as shown in Figure 1.

cnfs_cdc_architecture

3.3 File storage verification scheme

File storage verification scheme is the basis for service providers to prove the integrity of their stored data to service consumers. After each service, the service information will be written into the blockchain. Therefore, CDC has become an important computing and storage mode of blockchain. The file storage method is shown in Figure 2.

cnfs_block_structure

4. Important technologies of CDC

4.1 Location service based on mobile cloud

As an indispensable supporting technology of mobile CDC, location-based services can provide a variety of location-based services around the architecture of location-based services, such as mainstream location technology, location index, query processing and so on. Location services based on traditional positioning technologies such as GPS cover a wide range, which has been widely used in many fields, such as military, transportation and so on. However, GPS has many problems, such as weak penetration, high positioning energy consumption and so on, which can not fully meet the requirements of new mobile applications such as accurate indoor positioning and user action recognition. Through CDC, we can complete location services of mobile cloud, such as automatic shopping guide service, patient monitoring in smart home, etc. Mobile CDC model has been used to build new location services, which can solve and form an important supporting technology.

4.2 Energy saving technology of mobile terminal

The battery capacity of mobile terminals is growing slowly, and the contradiction between the rapid and rich mobile applications and the limited power of mobile terminals is becoming increasingly prominent. Through CDC, we can achieve energy saving in many aspects, such as data transmission energy saving. The proportion of wireless data transmission energy consumption in the energy consumption of mobile terminals is also increasing. Through cellular network transmission data, we can usually use RRC protocol for the whole process of mobile terminal energy consumption measurement. The results show that there is too much tail energy consumption in the process of data transmission, which reduces the energy utilization of mobile terminals. By changing the time threshold of tail energy consumption, we can reduce the number and time of jumping to the tail energy consumption state. Through transmission scheduling, we can reduce the tail energy consumption. Through the virtual ending mechanism and double queue scheduling algorithm, we can schedule the prefetch data and delay transmission, which can adjust the time threshold.

4.3 Data security and privacy protection

While obtaining rich services of CDC, mobile users will face more security threats such as privacy exposure. This requires us to strengthen the data security and privacy protection of CDC. In mobile CDC environment, users’ data and computing tasks migrate through wireless network, which can realize and support online query, multi-user data sharing and so on. In view of the limited computing resources and mobility of mobile terminals, cloud authentication platform can avoid the degradation of service performance caused by multi-user parallel access. Through a series of new cryptographic mechanisms, we can choose between encryption and attribute based encryption. By introducing an access structure to associate ciphertext or user private key with attributes, we can flexibly represent access control policies, which can provide fine-grained access authorization for data.

5. Conclusion

At present, CDC has become an important way of data processing and storage, which can optimize decentralization and other measures. CNFS is a new application based on HTTP, which is an attempt of new technology. Therefore, distributed computing will become the main computing method of computer in the future, which can improve a variety of IT.

References

[1] Cui Yong, Song Jian, Miao congcongcong, Tang Jun. research progress and trend of mobile CDC [J]. Acta computer Sinica, 2017, 40 (02): 273-295.

[2] Ding Jian, Wang Huaimin, Shi Peichang, Wu Qingbo, Dai Huadong, Fu Hongyi. Trusted cloud service [J]. Acta computa Sinica, 2015, 38 (01): 133-149.

[3] Li Jia. Intelligent logistics model reconstruction based on big data CDC [J]. China’s circulation economy, 2019, 33 (02): 20-29.

[4] Lu Xiaobin, Wang Jianya. Analysis of the current situation of CDC Adoption Behavior [J]. Journal of Chinese library, 2015, 41 (01): 92-111.

[5] Lu Xiaobin, Wang Tao. Research on technical improvement and optimization of massive data analysis process by Google’s three CDC technologies [J]. Library and information work, 2015, 59 (03): 6-11+102.

[6] Pengxiaosheng, dengdiyuan, chengshijie, wenjinyu, Li Chaohui, Niulin. Key technologies of power big data for smart grid application [J]. Journal of China Electric Engineering, 2015, 35 (03): 503-511.

[7] Qin Rongsheng. Research on the impact of big data and CDC technology on audit [J]. Audit research, 2014 (06): 23-28.

[8] Shi Weisong, zhangxingzhou, wangyifan, zhangqingyang. Edge calculation: present situation and Prospect [J]. Computer research and development, 2019, 56 (01): 69-89.

[9] Shi Weisong. Edge computing: a new computing model in the era of Internet of things [J]. Computer research and development, 2017, 54 (05): 907-924.

[10] Sun Lei, Hu Xuelong, ZhangXiaobin, Li Yun. CDC solutions for biomedical big data processing [J]. Journal of electronic measurement and instruments, 2014, 28 (11): 1190-1197.

[11] Wang Guiling, Han Yanbo, Zhang Zhongmei, Zhu Meiling. Stream data integration and service based on CDC [J]. Acta computa Sinica, 2017, 40 (01): 107-125.

[12] Wang Yu Ding, Yang Jia Hai, Xu Cong, Ling Xiao, Yang Yang. Overview of CDC access control technology [J]. Acta software Sinica, 2015, 26 (05): 1129-1150.

[13] Xu Baomin, Ni Xuguang. Development trend and key technology progress of CDC [J]. Chinese Academy of Sciences, 2015, 30 (02): 170-180.

[14] Yang Qingfeng. Key technology prediction and strategic selection in CDC era [J]. Journal of Chinese Academy of Sciences, 2015, 30 (02): 148-161+169.

[15] Zhou Yuezhi, Zhang Di. Near end CDC: opportunities and challenges in the post CDC era [J]. Acta computa Sinica, 2019, 42 (04): 677-700.