Instant download Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 Kohei Arai pdf all chapter
Instant download Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 Kohei Arai pdf all chapter
com
https://textbookfull.com/product/intelligent-
systems-and-applications-proceedings-of-
the-2020-intelligent-systems-conference-
intellisys-volume-2-kohei-arai/
https://textbookfull.com/product/intelligent-computing-proceedings-of-
the-2020-computing-conference-volume-2-kohei-arai/
textbookfull.com
https://textbookfull.com/product/intelligent-computing-proceedings-of-
the-2020-computing-conference-volume-3-kohei-arai/
textbookfull.com
https://textbookfull.com/product/spirituality-in-nursing-standing-on-
holy-ground-6th-edition-mary-elizabeth-obrien/
textbookfull.com
https://textbookfull.com/product/vampire-with-benefits-a-mythmatched-
story-supernatural-selection-book-2-e-j-russell/
textbookfull.com
https://textbookfull.com/product/art-alphabets-monograms-and-
lettering-first-edition-j-m-bergling/
textbookfull.com
https://textbookfull.com/product/active-geophysical-monitoring-2nd-
edition-junzo-kasahara-editor/
textbookfull.com
https://textbookfull.com/product/excel-simple-and-effective-
strategies-to-learn-and-execute-excel-programming-volume-3-3rd-
edition-mr-daniel-jones/
textbookfull.com
Think Trade Like a Champion The Secrets Rules Blunt Truths
of a Stock Market Wizard First Edition Mark Minervini
https://textbookfull.com/product/think-trade-like-a-champion-the-
secrets-rules-blunt-truths-of-a-stock-market-wizard-first-edition-
mark-minervini/
textbookfull.com
Advances in Intelligent Systems and Computing 1251
Kohei Arai
Supriya Kapoor
Rahul Bhatia Editors
Intelligent
Systems and
Applications
Proceedings of the 2020 Intelligent
Systems Conference (IntelliSys)
Volume 2
Advances in Intelligent Systems and Computing
Volume 1251
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
Rahul Bhatia
Editors
Intelligent Systems
and Applications
Proceedings of the 2020 Intelligent Systems
Conference (IntelliSys) Volume 2
123
Editors
Kohei Arai Supriya Kapoor
Saga University The Science and Information
Saga, Japan (SAI) Organization
Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information
(SAI) Organization
Bradford, West Yorkshire, UK
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface
This book contains the scientific contributions included in the program of the
Intelligent Systems Conference (IntelliSys) 2020, which was held during September
3–4, 2020, as a virtual conference. The Intelligent Systems Conference is a pres-
tigious annual conference on areas of intelligent systems and artificial intelligence
and their applications to the real world.
This conference not only presented state-of-the-art methods and valuable
experience from researchers in the related research areas, but also provided the
audience with a vision of further development in the fields. We have gathered a
multi-disciplinary group of contributions from both research and practice to discuss
the ways how intelligent systems are today architectured, modeled, constructed,
tested and applied in various domains. The aim was to further increase the body of
knowledge in this specific area by providing a forum to exchange ideas and discuss
results.
The program committee of IntelliSys 2020 represented 25 countries, and authors
submitted 545 papers from 50+ countries. This certainly attests to the widespread,
international importance of the theme of the conference. Each paper was reviewed
on the basis of originality, novelty and rigorousness. After the reviews, 214 were
accepted for presentation, out of which 177 papers are finally being published in the
proceedings.
The conference would truly not function without the contributions and support
received from authors, participants, keynote speakers, program committee mem-
bers, session chairs, organizing committee members, steering committee members
and others in their various roles. Their valuable support, suggestions, dedicated
commitment and hard work have made the IntelliSys 2020 successful. We warmly
thank and greatly appreciate the contributions, and we kindly invite all to continue
to contribute to future IntelliSys conferences.
v
vi Editor’s Preface
It has been a great honor to serve as the General Chair for the IntelliSys 2020 and
to work with the conference team. We believe this event will certainly help further
disseminate new ideas and inspire more international collaborations.
Kind Regards,
Kohei Arai
Conference Chair
Contents
vii
viii Contents
Abstract. Despite the success over the recent years, convolutional neu-
ral network (CNN) has a major limitation of the inability to retain spa-
tial relationship between learned features in deeper layers. Capsule net-
work with dynamic routing (CapsNet) was introduced in 2017 with a
speculation that CapsNet can overcome this limitation. In our research,
we created a suitable collection of datasets and implemented a simple
CNN model and a CapsNet model with similar complexity to test this
speculation. Experimental results show that both the implemented CNN
and CapsNet models have the ability to capture the spatial relationship
between learned features. Counterintuitively, our experiments show that
our CNN model outperforms our CapsNet model using our datasets. This
implies that the speculation does not seem to be entirely correct. This
might be due to the fact that our datasets are too simple, hence requir-
ing a simple CNN model. We further recommend future research to be
conducted using deeper models and more complex datasets to test the
speculation.
1 Introduction
2 Related Works
The concept of CNN was first proposed by LeCun et al. [12] in 1989. However,
due to the lack of computational power and availability of dataset, it was not
until recent years that researchers are able to develop feasible models utilizing
modern high-performance computers. One notable breakthrough was the work of
Krizhevsky et al. [1] which achieved state-of-the-art performance in the ImageNet
challenge [13] in 2012. Since then, countless researches have been conducted to
develop more advanced CNN models to be used successfully in real-world appli-
cations such as speech recognition [14], gait recognition [15], steering controls in
self-driving cars [16], human crowd detection [17], and medical image segmenta-
tion [18].
Despite successful demonstrations of CNN models, one of the pioneers, Geof-
frey Hinton argued that the current CNNs “are misguided in what they are
CapsNet vs CNN 3
trying to achieve” [19], due to the use of pooling layers for subsampling in CNN
models. The models lose the ability to compute precise spatial relationships
between learned features in the deeper layers. When a pooling layer is used in
between convolutional layers, only the most active neuron in a local region of
a feature map would be retained while the rest of the neurons are disregarded.
Such disregard of neurons causes the loss of spatial information of the features.
Furthermore, due to the use of scalars instead of vectors, properties of fea-
tures such as orientation, thickness, and skewness are lost. Therefore, Hinton,
G. E. et al. [19] proposed to group neurons together as vectors and use them to
represent the features of an object. These vectors are called capsules.
In 2017, Hinton and his team [11] proposed an architecture called capsule
networks with dynamic routing (CapsNet) that performed better than CNN
on the MNIST dataset. It achieved state-of-the-art results in MNIST with only
0.25% of test error. The CapsNet model which achieved 99.23% on the expanded
MNIST set were able to reach an accuracy of 79% on affNIST test set while a
CNN that achieved 99.22% accuracy on the expanded MNIST test set only
achieved 66% accuracy on affNIST test set. This proves that CapsNet is more
robust to affine transformations.
We have implemented CapsNet based on the original research paper [11] and
through the reconstruction network as mentioned in the paper, it can be seen
that CapsNet preserves the properties of the features well as shown in Fig. 1.
3 Methodology
We implemented a CNN model and a CapsNet model for this study. In general, a
CNN model consists of several convolutional layers with pooling layers between
them followed by fully-connected layers as explained in Sect. 3.2. A CapsNet
model consists of several convolutional layers without any pooling layers followed
by a primary capsule layer and a digit capsule layer as explained in Sect. 3.3.
Both of the models were designed to have the same number of layers in order
for them to be comparable.
In order to test the speculation, we need to design a dataset in such a way
that there are two classes of images containing the same features but the features
from different classes have different spatial arrangements. Training our models
directly on such a dataset may not yield any insight into the models as the
models will learn to identify the shape of the entire objects successfully instead
of the distinct features as intended.
Therefore, we prepared two groups of datasets whereby the first group con-
tains images of only distinct features while the second group contains objects
formed by the composition of the features. Our models are first trained on the
dataset from Group 1. Once the training is completed, the weights of the con-
volutional layers in both models will be frozen while the weights of the rest of
the layers will be re-trained on the dataset from Group 2. This will ensure that
our models learn the distinct features first before learning to identify the objects
using the learned features. This strategy is known as transfer learning.
Below we describe in detail regarding the dataset generation, testing of con-
volutional neural network model, and testing of capsule network with dynamic
routing model. Since our datasets only consist of objects with simple features,
relatively simple models should be sufficient to achieve good accuracy on the
evaluations.
Our dataset consists of two groups. Figure 2 shows samples from Group 1, which
contains images of arrows and non-arrows. Figure 3 shows samples from Group
2, which contains images of equilateral triangles and rectangles. Each image is
of 64 × 64 pixels.
We chose to use generated images in our dataset because there is too much
ambiguity in real-life images. Furthermore, simple polygon objects were chosen as
they are well-defined mathematically. This would enable us to test out different
ideas on how to design our experiments. Table 1 shows the organizations of our
datasets.
CapsNet vs CNN 5
architecture of CapsNet is similar to the original paper [11] except that we added
an extra convolutional layer and we used 16-D capsules on primary capsule layer
and 32-D capsules in digit capsule layer. We used the activation function as
proposed in the paper. To prevent overfitting, a reconstruction network [11] was
used. There were no pooling layers used.
Similar to CNN, our experiment was carried out by first training our CapsNet
model on the dataset from Group 1 and once the model was trained, we re-trained
the weights of the primary capsule layer and digit capsule layer of the model on
the dataset from Group 2 while freezing the weights of the convolutional layers.
The trained model was evaluated on the testing sets from their respective groups
after each training.
The trainings and the evaluations of the models were performed on a workstation
running on Ubuntu 16.04 equipped with 16 GB RAM, Core i7-6700K processor,
and two NVIDIA GTX1080Ti GPUs. The models were trained using the training
subsets and were evaluated on their respective testing sets. The evaluation results
in terms of accuracy (acc), precision (prec), recall (rec) and F1-score (F1) for
both models are shown in Table 2 below.
Table 2. (a) Evaluation Results for CapsNet. (b) Evaluation Results for CNN
(a)
Subset 1 Subset 2 Subset 3
(%) Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1
Group 1
Triangles vs 88.4 88.7 87.2 87.9 91 92.5 89 90.7 90.4 92.8 87.6 90.1
Rectangles
Group 2
Arrows vs 67.6 70.6 59.7 64.7 77.8 86.6 62.1 72.3 80.4 83.9 75.1 79.3
Non-Arrows
(b)
Subset 1 Subset 2 Subset 3
(%) Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1
Group 1
Triangles vs 98.5 99.4 97.1 98.2 99.3 99.5 98.9 99.2 99.6 99.8 99.2 99.6
Rectangles
Group 2
Arrows vs 92.6 95 87.2 90.9 95.8 95.2 95.8 95.5 96.6 96.8 92.5 94.6
Non-Arrows
CapsNet vs CNN 7
All the images were shuffled in their respective sets and normalized before
they were used for training and evaluation purposes. From Table 2(a), it is evi-
dent that CapsNet is able to determine whether a given image contains an arrow
or non-arrow by computing the spatial relationship between the learned features.
It can also be seen in Table 2 (b) that CNN has achieved near-perfect accuracies.
This is due to the fact that the generated datasets do not contain any real-world
noise.
We expected the CNN model to perform worse than CapsNet based on the
speculation stated earlier but it can be seen from the results that CNN actually
performed better than CapsNet. This might be due to the dataset being too
simple hence not requiring a deeper CNN model.
The use of pooling layers in between the convolutional layers should cause
the loss of spatial information of the features in a CNN. Hence, it might be
the case where our model is not deep enough. We expected our CNN model to
perform poorly to at least some degree due to the use of 3 pooling layers but
based on the results this is not the case. We chose a CNN model with only 3
pooling layers due to the simplicity of the datasets. From the results, it is evident
that the problem of retaining the spatial relationship between features is not a
serious issue for a relatively shallow model such as a model with only 3 pooling
layers. However, it is questionable whether a deeper CNN model would perform
well on a more complex dataset or not.
In our experiment, the objects in the images are formed by composing simple
features. There is only one equilateral triangle and one rectangle in every image.
Given the success of CNN, identifying such generated simple objects without
real-world noise is rather a trivial task for CNN. This could be another reason for
the high accuracy that CNN models have achieved in this experiment despite the
use of pooling layers. Our implementations are publicly available in this github
link.1
In this work, we have designed an experiment to test the speculation that Cap-
sNet is able to retain the spatial relationship between features better than CNN.
In order to carry out the experiment, we have generated our own datasets.
From our results, both the shallow CNN and CapsNet models have shown
the capability to retain the spatial relationship between features. However, the
speculation that CapsNet is able to retain spatial relationship between features
better than CNN does not seem to be true for shallow models on simple datasets.
It is still uncertain whether this speculation is true for deeper models on more
complex datasets and on noisy datasets.
Considering the fact that CNN has been developed extensively since its inven-
tion in 1989 [12], it is possible that our experiment was too simple for CNN.
CapsNet on the other hand, is still at a rudimentary stage and the fact that its
1
https://github.com/MMU-VisionLab/CapsNet-vs-CNN.
8 U. Manogaran et al.
performance level is close to CNN in this experiment means that CapsNet has
great potential.
Future research in this area should consider the usage of more complex fea-
tures to represent the objects in the datasets and deeper models in order to
further understand the capabilities and limitations of these models. Gaining
deeper insights on these models in the retention of spatial relationship between
features will guide future developments in a better way.
References
1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich,
A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1–9 (2015)
3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
4. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified,
real-time object detection. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 779–788 (2016)
5. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-
rate object detection and semantic segmentation. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
6. Nair, P., Doshi, R., Keselj, S.: Pushing the limits of capsule networks. Technical
note (2018)
7. Algamdi, A.M., Sanchez, V., Li, C.T.: Learning temporal information from spatial
information using CapsNets for human action recognition. In: 2019 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP
2019, pp. 3867–3871 (2019)
8. Xi, E., Bing, S., Jin, Y.: Capsule network performance on complex data. arXiv
preprint arXiv:1712.03480 (2017)
9. Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale
capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
10. Chidester, B., Do, M.N., Ma, J.: Rotation equivariance and invariance in convolu-
tional neural networks. arXiv preprint arXiv:1805.12301 (2018)
11. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In:
Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
12. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural
Comput. 1(4), 541–551 (1989)
13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-
scale hierarchical image database. In: IEEE Conference on Computer Vision and
Pattern Recognition (2009)
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
CapsNet vs CNN 9
14. Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recog-
nition system using raw speech as input. In: Sixteenth Annual Conference of the
International Speech Communication Association (2015)
15. Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition
for human identification. In: IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 2832–2836 (2016)
16. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal,
P., Zhang, X.: End to end learning for self-driving cars. arXiv preprint
arXiv:1604.07316 (2016)
17. Tzelepi, M., Tefas, A.: Human crowd detection for drone flight safety using
convolutional neural networks. In: 25th European Signal Processing Conference
(EUSIPCO), pp. 743–747. IEEE (2017)
18. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks
for volumetric medical image segmentation. In: IEEE Fourth International Con-
ference on 3D Vision (3DV), pp. 565–571 (2016)
19. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Pro-
ceedings of the 21th International Conference on Artificial Neural Networks-
Volume Part I, pp. 44–51 (2011)
20. LaLonde, R., Bagci, U.: Capsules for object segmentation. arXiv preprint
arXiv:1804.04241 (2018)
Improved 2D Human Pose Tracking
Using Optical Flow Analysis
1 Introduction
Human motion tracking is an important application of machine vision algorithms
that could be used for many business purposes. The most popular tasks in the
digital world include distributed video surveillance system, solutions for digital
marketing, solutions for human tracking in an industrial environment.
This task can have different levels of details. The high-level approach is object
detection, when the position of human as a whole object is extracted and its
bounding box in 2D or 3D space is estimated.
A more interesting approach would be to detect a human pose in motion. This
task is more complicated because human pose has substantially more dimensions
compared to a bounding box.
Recent advances in deep learning have resulted in efficient single-frame pose
tracking algorithms, such as [6,14]. By applying them sequentially to a video
stream, a set of trajectories for joints may be obtained. However, since these
2 Related Work
The task of retrieving pose dynamics for all persons in the video may be con-
sidered as a variant of multiple object tracking (MOT) task, where the con-
sidered objects are not persons but individual pose keypoints. There are two
major paradigms in the field of MOT - detection-based tracking and detection-
free tracking [11]. In the first case, machine vision algorithm capable of detect-
ing individual objects is applied to every frame separately and then individual
detections are linked into trajectories. The second approach has no detection
algorithm and instead relies on temporal changes in the video stream to detect
objects. With the development of efficient real-time object detection algorithms
in recent years, the detection-based approach has become dominant in the lit-
erature. However, independent analysis of video frames results in inevitable loss
of information conveyed by temporal changes in the video. This information
may be relevant to object detection and could help improve the tracker perfor-
mance. Various approaches were suggested to combine these individual frame
and temporal features.
For example, in [12] a novel approach to combine temporal and spatial fea-
tures was proposed by adding recurrent temporal component to a convolutional
neural network (CNN) designed to detect objects in a single frame. The outputs
of object detection network in sequential frames were fed into recurrent neural
network (RNN). The resulting architecture is then trained to predict the refined
tracking locations.
In [1] a tracker using prior information about possible person pose dynamics
is proposed. This information is modelled as a hierarchical Gaussian process
latent variable model, and allows to impose some temporal coherency in detected
articulations.
In [17] a method leveraging optical flow for a pose tracking is proposed. The
velocities obtained from flow data are used to generate expected coordinates of
a pose in next frame. Predicted coordinates are used later to form tracks by
greedy matching.
Our research is based on OpenPose as a body pose detector, proposed in
[3]. It is a real-time solution capable to detect a 2D pose of multiple people
in an image. It uses a non-parametric representation, which is referred to as
Part Affinity Fields (PAFs), to learn to associate body parts with individuals in
the image. This bottom-up system achieves high accuracy and real-time perfor-
mance, regardless of the number of people in the image.
12 A. Khelvas et al.
3 Definitions
Fist let us define several frames of reference (FoR) for our research, which are
shown in Fig. 1.
4 Method
Our goal is to propose a novel algorithm for robust tracking of multiple person
poses in the video stream by leveraging both temporal and spatial features of the
data. To achieve this, we combine predictions done by a single-frame person/pose
detection algorithm (such as OpenPose and YOLO) with Optical Flow - based
estimations through a Kalman filter.
The complete algorithm is described below and shown in Fig. 2.
Improved 2D Human Pose Tracking Using Optical Flow Analysis 13
2. Objects detection step provides a set of bounding boxes for each person,
detected by YOLO or some other object detection algorithm.
3. Pose detection ROI generation step provides a set of input frame regions for
further pose detection.