0% found this document useful (0 votes)

25 views

Natural Language Processing for Global and Local Business 1st Edition Fatih Pinarbasi All Chapters Instant Download

Local

Uploaded by

elumaguele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Natural Language Processing for Global and Local Business 1st Edition Fatih Pinarbasi All Chapters Instant Download

Local

Uploaded by

elumaguele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Experience Seamless Full Ebook Downloads for Every Genre at textbookfull.

com

Natural Language Processing for Global and Local

Business 1st Edition Fatih Pinarbasi

https://textbookfull.com/product/natural-language-
processing-for-global-and-local-business-1st-edition-fatih-
pinarbasi/

OR CLICK BUTTON

DOWNLOAD NOW

Explore and download more ebook at https://textbookfull.com

Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Python Natural Language Processing Advanced machine

learning and deep learning techniques for natural language
processing 1st Edition Jalaj Thanaki
https://textbookfull.com/product/python-natural-language-processing-
advanced-machine-learning-and-deep-learning-techniques-for-natural-
language-processing-1st-edition-jalaj-thanaki/
textboxfull.com

Applied Natural Language Processing with Python:

Implementing Machine Learning and Deep Learning Algorithms
for Natural Language Processing 1st Edition Taweh Beysolow
Ii
https://textbookfull.com/product/applied-natural-language-processing-
with-python-implementing-machine-learning-and-deep-learning-
algorithms-for-natural-language-processing-1st-edition-taweh-beysolow-
ii/
textboxfull.com

Natural Language Processing 1st Edition Jacob Eisenstein

https://textbookfull.com/product/natural-language-processing-1st-
edition-jacob-eisenstein/

textboxfull.com

Deep Learning for Natural Language Processing Develop Deep

Learning Models for Natural Language in Python Jason
Brownlee
https://textbookfull.com/product/deep-learning-for-natural-language-
processing-develop-deep-learning-models-for-natural-language-in-
python-jason-brownlee/
textboxfull.com
Natural Language Processing for Social Media 1st Edition
Anna Atefeh Farzindar

https://textbookfull.com/product/natural-language-processing-for-
social-media-1st-edition-anna-atefeh-farzindar/

textboxfull.com

Natural Language Processing for Electronic Design

Automation Mathias Soeken

https://textbookfull.com/product/natural-language-processing-for-
electronic-design-automation-mathias-soeken/

textboxfull.com

Transformers for Natural Language Processing and Computer

Vision, Third Edition Denis Rothman

https://textbookfull.com/product/transformers-for-natural-language-
processing-and-computer-vision-third-edition-denis-rothman/

textboxfull.com

Deep Learning for Natural Language Processing (MEAP V07)

Stephan Raaijmakers

https://textbookfull.com/product/deep-learning-for-natural-language-
processing-meap-v07-stephan-raaijmakers/

textboxfull.com

Deep learning in natural language processing Deng

https://textbookfull.com/product/deep-learning-in-natural-language-
processing-deng/

textboxfull.com
Natural Language
Processing for Global and
Local Business

Fatih Pinarbasi
Istanbul Medipol University, Turkey

M. Nurdan Taskiran
Istanbul Medipol University, Turkey

A volume in the Advances in Business Information

Systems and Analytics (ABISA) Book Series
Published in the United States of America by
IGI Global
Business Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA, USA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com

Copyright © 2021 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Names: Pinarbași, Fatih, 1991- editor. | Taşkıran, Nurdan Öncel, 1960-
editor.
Title: Natural language processing for global and local business / Fatih
Pinarbași and M. Nurdan Oncel Taskiran, editors.
Description: Hershey, PA : Business Science Reference, [2020] | Includes
bibliographical references and index. | Summary: “This book explores the
theoretical and practical phenomenon of natural language processing
through different languages and platforms in terms of today’s
conditions”-- Provided by publisher.
Identifiers: LCCN 2019059926 (print) | LCCN 2019059927 (ebook) | ISBN
9781799842408 (hardcover) | ISBN 9781799851349 (paperback) | ISBN
9781799842415 (ebook)
Subjects: LCSH: Natural language processing (Computer science) |
Computational linguistics. | Discourse analysis--Data processing. |
Emotive (Linguistics)
Classification: LCC QA76.9.N38 N388 2020 (print) | LCC QA76.9.N38 (ebook)
| DDC 006.3/5--dc23
LC record available at https://lccn.loc.gov/2019059926
LC ebook record available at https://lccn.loc.gov/2019059927

This book is published in the IGI Global book series Advances in Business Information Systems and Analytics (ABISA)
(ISSN: 2327-3275; eISSN: 2327-3283)

British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.

For electronic access to this publication, please contact: eresources@igi-global.com.

Advances in Business
Information Systems and
Analytics (ABISA) Book Series
Madjid Tavana
La Salle University, USA
ISSN:2327-3275
EISSN:2327-3283

Mission
The successful development and management of information systems and business analytics is crucial
to the success of an organization. New technological developments and methods for data analysis have
allowed organizations to not only improve their processes and allow for greater productivity, but have
also provided businesses with a venue through which to cut costs, plan for the future, and maintain
competitive advantage in the information age.
The Advances in Business Information Systems and Analytics (ABISA) Book Series aims to present
diverse and timely research in the development, deployment, and management of business information
systems and business analytics for continued organizational development and improved business value.

Coverage
• Performance Metrics
IGI Global is currently accepting manuscripts
• Algorithms
for publication within this series. To submit a pro-
• Business Models
posal for a volume in this series, please contact our
• Data Strategy
Acquisition Editors at Acquisitions@igi-global.com
• Forecasting
or visit: http://www.igi-global.com/publish/.
• Legal information systems
• Business Information Security
• Statistics
• Strategic Information Systems
• Geo-BIS

The Advances in Business Information Systems and Analytics (ABISA) Book Series (ISSN 2327-3275) is published by IGI Global, 701
E. Chocolate Avenue, Hershey, PA 17033-1240, USA, www.igi-global.com. This series is composed of titles available for purchase individu-
ally; each title is edited to be contextually exclusive from any other title within the series. For pricing and ordering information please visit
http://www.igi-global.com/book-series/advances-business-information-systems-analytics/37155. Postmaster: Send all address changes to above
address. © © 2021 IGI Global. All rights, including translation in other languages reserved by the publisher. No part of this series may be
reproduced or used in any form or by any means – graphics, electronic, or mechanical, including photocopying, recording, taping, or informa-
tion and retrieval systems – without written permission from the publisher, except for non commercial, educational use, including classroom
teaching purposes. The views expressed in this series are those of the authors, but not necessarily of IGI Global.
Titles in this Series
For a list of additional titles in this series, please visit:
https://www.igi-global.com/book-series/advances-business-information-systems-analytics/37155

Applications of Big Data and Business Analytics in Management

Sneha Kumari (Vaikunth Mehta National Institute of Cooperative Management, India) K. K. Tripathy (Vaikunth
Mehta National Institute of Cooperative Management, India) and Vidya Kumbhar (Symbiosis International Uni-
versity (Deemed, India)
Business Science Reference • © 2020 • 300pp • H/C (ISBN: 9781799832614) • US $225.00

Handbook of Research on Integrating Industry 4.0 in Business and Manufacturing

Isak Karabegović (Academy of Sciences and Arts of Bosnia and Herzegovina, Bosnia and Herzegovina) Ahmed
Kovačević (City, University London, UK) Lejla Banjanović-Mehmedović (University of Tuzla, Bosnia and Her-
zegovina) and Predrag Dašić (High Technical Mechanical School of Professional Studies in Trstenik, Serbia)
Business Science Reference • © 2020 • 661pp • H/C (ISBN: 9781799827252) • US $265.00

Internet of Things (IoT) Applications for Enterprise Productivity

Erdinç Koç (Bingol University, Turkey)
Business Science Reference • © 2020 • 357pp • H/C (ISBN: 9781799831754) • US $215.00

Trends and Issues in International Planning for Businesses

Babayemi Adekunle (Arden University, UK) Husam Helmi Alharahsheh (University of Wales Trinity Saint David,
UK) and Abraham Pius (Arden University, UK)
Business Science Reference • © 2020 • 225pp • H/C (ISBN: 9781799825470) • US $225.00

Institutional Assistance Support for Small and Medium Enterprise Development in Africa
Isaac Oluwajoba Abereijo (Obafemi Awolowo University, Nigeria)
Business Science Reference • © 2020 • 280pp • H/C (ISBN: 9781522594819) • US $205.00

Role of Regional Development Agencies in Entrepreneurial and Rural Development Emerging Research and
Opportunities
Milan B. Vemić (Union – Nikola Tesla University, Serbia)
Business Science Reference • © 2020 • 246pp • H/C (ISBN: 9781799826415) • US $175.00

Using Applied Mathematical Models for Business Transformation

Antoine Trad (IBISTM, France) and Damir Kalpić (University of Zagreb, Croatia)
Business Science Reference • © 2020 • 543pp • H/C (ISBN: 9781799810094) • US $265.00

701 East Chocolate Avenue, Hershey, PA 17033, USA

Tel: 717-533-8845 x100 • Fax: 717-533-8661
E-Mail: cust@igi-global.com • www.igi-global.com
Table of Contents

Foreword............................................................................................................................................... xv

Preface.................................................................................................................................................. xvi

Acknowledgment................................................................................................................................. xxi

Section 1
A General Outlook on Natural Language Processing

Chapter 1
Academy and Company Needs: The Past and Future of NLP ................................................................1
Tiago Martins da Cunha, UNILAB, Brazil

Chapter 2
Deriving Business Value From Online Data Sources Using Natural Language Processing
Techniques ............................................................................................................................................17
Stephen Camilleri, University of Malta, Malta

Chapter 3
Natural Language Processing in Online Reviews .................................................................................40
Gunjan Ansari, JSS Academy of Technical Education, Noida, India
Shilpi Gupta, JSS Academy of Technical Education, Noida, India
Niraj Singhal, Shobhit Institute of Engineering and Technology (Deemed), Meerut, India

Chapter 4
Sentiment Analysis as a Restricted NLP Problem ................................................................................65
Akshi Kumar, Delhi Technological University, India
Divya Gupta, Galgotias University, India

Chapter 5
Deep Learning for Sentiment Analysis: An Overview and Perspectives..............................................97
Vincent Karas, University of Augsburg, Germany
Björn W. Schuller, University of Augsburg, Germany

Section 2
Natural Language Processing in Business

Chapter 6
Metaphors in Business Applications: Modelling Subjectivity Through Emotions for Metaphor
Comprehension ...................................................................................................................................134
Sunny Rai, Mahindra Ecole Centrale, Hyderabad, India
Shampa Chakraverty, Netaji Subhas University of Technology, India
Devendra Kumar Tayal, Indira Gandhi Delhi Technical University for Women, India

Chapter 7
Estimating Importance From Web Reviews Through Textual Description and Metrics
Extraction ............................................................................................................................................154
Roney Lira de Sales Santos, University of Sao Paulo, Brazil
Carlos Augusto de Sa, Federal University of Piaui, Brazil
Rogerio Figueredo de Sousa, University of Sao Paulo, Brazil
Rafael Torres Anchiêta, University of Sao Paulo, Brazil
Ricardo de Andrade Lira Rabelo, Federal University of Piaui, Brazil
Raimundo Santos Moura, Federal University of Piaui, Brazil

Chapter 8
Discovery of Sustainable Transport Modes Underlying TripAdvisor Reviews With Sentiment
Analysis: Transport Domain Adaptation of Sentiment Labelled Data Set .........................................180
Ainhoa Serna, University of the Basque Country, Spain
Jon Kepa Gerrikagoitia, BRTA Basque Research and Technology Alliance, Spain

Chapter 9
Research Journey of Hate Content Detection From Cyberspace ........................................................200
Sayani Ghosal, Ambedkar Institute of Advanced Communication Technologies and Research,
India
Amita Jain, Ambedkar Institute of Advanced Communication Technologies and Research,
India

Chapter 10
The Use of Natural Language Processing for Market Orientation on Rare Diseases .........................226
Matthias Hölscher, Institute for IT Management and Digitization, FOM University, Germany
Rudiger Buchkremer, Institute for IT Management and Digitization, FOM University,
Germany

Chapter 11
Quality Assurance in Computer-Assisted Translation in Business Environments .............................247
Sanja Seljan, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Nikolina Škof Erdelja, Ciklopea, Croatia
Vlasta Kučiš, Faculty of Arts, University of Maribor, Slovenia
Ivan Dunđer, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Mirjana Pejić Bach, Faculty of Economics and Business, University of Zagreb, Croatia

Section 3
Diversity Among Languages Over Natural Language Processing

Chapter 12
An Extensive Text Mining Study for the Turkish Language: Author Recognition, Sentiment
Analysis, and Text Classiﬁcation ........................................................................................................272
Durmuş Özkan Şahin, Ondokuz Mayıs University, Turkey
Erdal Kılıç, Ondokuz Mayıs University, Turkey

Chapter 13
Sentiment Analysis of Arabic Documents: Main Challenges and Recent Advances .........................307
Hichem Rahab, ICISI Laboratory, University of Khenchela, Algeria
Mahieddine Djoudi, TechNE Laboratory, University of Poitiers, France
Abdelhafid Zitouni, LIRE Laboratory, University of Constantine 2, Algeria

Chapter 14
Building Lexical Resources for Dialectical Arabic ............................................................................332
Sumaya Sulaiman Al Ameri, Khalifa University of Science and Technology, UAE
Abdulhadi Shoufan, Center for Cyber-Physical Systems, Khalifa University of Science and
Technology, UAE

Chapter 15
A Critical Review of the Current State of Natural Language Processing in Mexico and Chile .........365
César Aguilar, Pontificia Universidad Católica de Chile, Chile
Olga Acosta, Singularyta SpA, Chile

Compilation of References ...............................................................................................................390

About the Contributors ....................................................................................................................443

Index ...................................................................................................................................................450
Detailed Table of Contents

Foreword............................................................................................................................................... xv

Preface.................................................................................................................................................. xvi

Acknowledgment................................................................................................................................. xxi

Section 1
A General Outlook on Natural Language Processing

Chapter 1
Academy and Company Needs: The Past and Future of NLP ................................................................1
Tiago Martins da Cunha, UNILAB, Brazil

This chapter presents a view of how the use of NLP knowledge might change the relation between
universities and companies. Products from NLP analysis are expected in both ends of this at times not
so reciprocal exchange. But history has shown the products developed by universities and companies are
complementary for the development of NLP. The great volume of data the world is producing is requiring
newer perspectives to provide understanding. These newer aspects found on big data may provide the
comprehension of human language categorization and therefore possibly human language acquisition.
But to process data more data need to be produced and not all companies have the time to dedicate for
this task. This chapter aims to present through sharing literature review and experience in the ﬁeld that
partnerships are the most reliable resource for the cycle of knowledge production in NLP. Companies
need to be receptive of the theoretical knowledge the university may provide, and universities must turn
their theoretical knowledge for a more applied envionment.

The wealth of information produced over the internet empowers businesses to become data-driven
organizations, increasing their ability to predict consumer behavior, take more informed strategic decisions,
and remain competitive on the market. However, past research did not identify which online data sources
companies should choose to achieve such an objective. This chapter aims to analyse how online news
articles, social media messages, and user reviews can be exploited by businesses using natural language
processing (NLP) techniques to build business intelligence. NLP techniques assist computers to understand

and derive a valuable meaning from human (natural) languages. Following a brief introduction to NLP
and a description of how these three text streams diﬀer from each other, the chapter discusses six main
factors that can assist businesses in choosing one data source from another. The chapter concludes with
future directions towards improving business applications involving NLP techniques.

The analysis of the online data posted on various e-commerce sites is required to improve consumer
experience and thus enhance global business. The increase in the volume of social media content in
the recent years led to the problem of overfitting in review classification. Thus, there arises a need to
select relevant features to reduce computational cost and improve classifier performance. This chapter
investigates various statistical feature selection methods that are time efficient but result in selection of
few redundant features. To overcome this issue, wrapper methods such as sequential feature selection
(SFS) and recursive feature elimination (RFE) are employed for selection of optimal feature set. The
empirical analysis was conducted on movie review dataset using three different classifiers and the results
depict that SVM could achieve f-measure of 96% with only 8% selected features using RFE method.

With the accelerated evolution of social networks, there is a tremendous increase in opinions by the
people about products or services. While this user-generated content in natural language is intended to
be valuable, its large amounts require use of content mining methods and NLP to uncover the knowledge
for various tasks. In this study, sentiment analysis is used to analyze and understand the opinions of
users using statistical approaches, knowledge-based approaches, hybrid approaches, and concept-based
ontologies. Unfortunately, sentiment analysis also experiences a range of diﬃculties like colloquial words,
negation handling, ambiguity in word sense, coreference resolution, which highlight another perspective
emphasizing that sentiment analysis is certainly a restricted NLP problem. The purpose of this chapter is
to discover how sentiment analysis is a restricted NLP problem. Thus, this chapter discussed the concept
of sentiment analysis in the ﬁeld of NLP and explored that sentiment analysis is a restricted NLP problem
due to the sophisticated nature of natural language.

Sentiment analysis is an important area of natural language processing that can help inform business
decisions by extracting sentiment information from documents. The purpose of this chapter is to introduce
the reader to selected concepts and methods of deep learning and show how deep models can be used
to increase performance in sentiment analysis. It discusses the latest advances in the ﬁeld and covers

topics including traditional sentiment analysis approaches, the fundamentals of sentence modelling,
popular neural network architectures, autoencoders, attention modelling, transformers, data augmentation
methods, the beneﬁts of transfer learning, the potential of adversarial networks, and perspectives on
explainable AI. The authors’ intent is that through this chapter, the reader can gain an understanding of
recent developments in this area as well as current trends and potentials for future research.

Section 2
Natural Language Processing in Business

Commercial advertisements, social campaigns, and ubiquitous online reviews are a few non-literary
domains where creative text is profusely embedded to capture a viewer’s imagination. Recent AI business
applications such as chatbots and interactive digital campaigns emphasise the need to process creative text
for a seamless and fulfilling user experience. Figurative text in human communication conveys implicit
perceptions and unspoken emotions. Metaphor is one such figure of speech that maps a latent idea in
a target domain to an evocative concept from a source domain. This chapter explores the problem of
computational metaphor interpretation through the glass of subjectivity. The world wide web is mined
to learn about the source domain concept. Ekman emotion categories and pretrained word embeddings
are used to model the subjectivity. The performance evaluation is performed to determine the reader’s
preference for emotive vs non emotive meanings. This chapter establishes the role of subjectivity and
user inclination towards the meaning that fits in their existing cognitive schema.

The evolution of e-commerce has contributed to the increase of the information available, making
the task of analyzing the reviews manually almost impossible. Due to the amount of information,
the creation of automatic methods of knowledge extraction and data mining has become necessary.
Currently, to facilitate the analysis of reviews, some websites use filters such as votes by the utility or
by stars. However, the use of these filters is not a good practice because they may exclude reviews that
have recently been submitted to the voting process. One possible solution is to filter the reviews based
on their textual descriptions, author information, and other measures. This chapter has a propose of
approaches to estimate the importance of reviews about products and services using fuzzy systems and
artificial neural networks. The results were encouraging, obtaining better results when detecting the most
important reviews, achieving approximately 82% when f-measure is analyzed.

In recent years, digital technology and research methods have developed natural language processing
for better understanding consumers and what they share in social media. There are hardly any studies in
transportation analysis with TripAdvisor, and moreover, there is not a complete analysis from the point
of view of sentiment analysis. The aim of study is to investigate and discover the presence of sustainable
transport modes underlying in non-categorized TripAdvisor texts, such as walking mobility in order to
impact positively in public services and businesses. The methodology follows a quantitative and qualitative
approach based on knowledge discovery techniques. Thus, data gathering, normalization, classiﬁcation,
polarity analysis, and labelling tasks have been carried out to obtain sentiment labelled training data set
in the transport domain as a valuable contribution for predictive analytics. This research has allowed the
authors to discover sustainable transport modes underlying the texts, focused on walking mobility but
extensible to other means of transport and social media sources.

Hate content detection is the most prospective and challenging research area under the natural language
processing domain. Hate speech abuse individuals or groups of people based on religion, caste, language,
or sex. Enormous growth of digital media and cyberspace has encouraged researchers to work on hatred
speech detection. A commonly acceptable automatic hate detection system is required to stop flowing
hate-motivated data. Anonymous hate content is affecting the young generation and adults on social
networking sites. Through numerous studies and review papers, the chapter identifies the need for
artificial intelligence (AI) in hate speech research. The chapter explores the current state-of-the-art and
prospects of AI in natural language processing (NLP) and machine learning algorithms. The chapter
aims to identify the most successful methods or techniques for hate speech detection to date. Revolution
in this research helps social media to provide a healthy environment for everyone.

Rare diseases in their entirety have a substantial impact on the healthcare market, as they affect a large
number of patients worldwide. Governments provide financial support for diagnosis and treatment.
Market orientation is crucial for any market participant to achieve business profitability. However, the
market for rare diseases is opaque. The authors compare results from search engines and healthcare
databases utilizing natural language processing. The approach starts with an information retrieval process,

applying the MeSH thesaurus. The results are prioritized and visualized, using word clouds. In total, the
chapter is about the examination of 30 rare diseases and about 500,000 search results in the databases
Pubmed, FindZebra, and the search engine Google. The authors compare their results to the search for
common diseases. The authors conclude that FindZebra and Google provide relatively good results
for the evaluation of therapies and diagnoses. However, the quantity of the ﬁndings from professional
databases such as Pubmed remains unsurpassed.

Increased use of computer-assisted translation (CAT) technology in business settings with augmented
amounts of tasks, collaborative work, and short deadlines give rise to errors and the need for quality
assurance (QA). The research has three operational aims: 1) methodological framework for QA analysis,
2) comparative evaluation of four QA tools, 3) to justify introduction of QA into CAT process. The
research includes building of translation memory, terminology extraction, and creation of terminology
base. Error categorization is conducted by multidimensional quality (MQM) framework. The level of
mistake is calculated considering detected, false, and not detected errors. Weights are assigned to errors
(minor, major, or critical), penalties are calculated, and quality estimation for translation memory is
given. Results show that process is prone to errors due to diﬀerences in error detection, harmonization,
and error counting. Data analysis of detected errors leads to further data-driven decisions related to the
quality of output results and improved eﬃcacy of translation business process.

Section 3
Diversity Among Languages Over Natural Language Processing

In this study, the authors give both theoretical and experimental information about text mining, which
is one of the natural language processing topics. Three different text mining problems such as news
classification, sentiment analysis, and author recognition are discussed for Turkish. They aim to reduce
the running time and increase the performance of machine learning algorithms. Four different machine
learning algorithms and two different feature selection metrics are used to solve these text classification
problems. Classification algorithms are random forest (RF), logistic regression (LR), naive bayes (NB),
and sequential minimal optimization (SMO). Chi-square and information gain metrics are used as the
feature selection method. The highest classification performance achieved in this study is 0.895 according
to the F-measure metric. This result is obtained by using the SMO classifier and information gain metric

for news classiﬁcation. This study is important in terms of comparing the performances of classiﬁcation
algorithms and feature selection methods.

Today, it is usual that a consumer seeks for others’ feelings about their purchasing experience on the web
before a simple decision of buying a product or a service. Sentiment analysis intends to help people in
taking profit from the available opinionated texts on the web for their decision making, and business is
one of its challenging areas. Considerable work of sentiment analysis has been achieved in English and
other Indo-European languages. Despite the important number of Arabic speakers and internet users,
studies in Arabic sentiment analysis are still insufficient. The current chapter vocation is to give the
main challenges of Arabic sentiment together with their recent proposed solutions in the literature. The
chapter flowchart is presented in a novel manner that obtains the main challenges from presented literature
works. Then it gives the proposed solutions for each challenge. The chapter reaches the finding that the
future tendency will be toward rule-based techniques and deep learning, allowing for more dealings with
Arabic language inherent characteristics.

The natural language processing of Arabic dialects faces a major difficulty, which is the lack of lexical
resources. This problem complicates the penetration and the business of related technologies such as
machine translation, speech recognition, and sentiment analysis. Current solutions frequently use lexica,
which are specific to the task at hand and limited to some language variety. Modern communication
platforms including social media gather people from different nations and regions. This has increased the
demand for general-purpose lexica towards effective natural language processing solutions. This chapter
presents a collaborative web-based platform for building a cross-dialectical, general-purpose lexicon for
Arabic dialects. This solution was tested by a team of two annotators, a reviewer, and a lexicographer.
The lexicon expansion rate was measured and analyzed to estimate the overhead required to reach the
desired size of the lexicon. The inter-annotator reliability was analyzed using Cohen’s Kappa.

This chapter presents a critical review of the current state of natural language processing in Chile and
Mexico. Speciﬁcally, a general review is made regarding the technological evolution of these countries
in this area of research and development, as well as the progress they have made so far. Subsequently,
the remaining problems and challenges are addressed. Speciﬁcally, two are analyzed in detail here: (1)

the lack of a strategic policy that helps to establish stronger links between academia and industry and
(2) the lack of a technological inclusion of the indigenous languages, which causes a deep digital divide
between Spanish (considered in Chile and Mexico as their oﬃcial language) with them.

Compilation of References ...............................................................................................................390

About the Contributors ....................................................................................................................443

Index ...................................................................................................................................................450
xv

Foreword

Natural Language Processing has attracted the attention of scholars since last century. Thanks to the
developments in computer technology and digitalisation of text materials, linguists have found ample
resources to study on. Programming languages have recently been enriched by special NLP libraries
and third-party tools have been available for the use of NLP analyses. These developments may suggest
that things are much easier than they were before. On the contrary, these developments have increased
the research appetite and enthused scholars to study more challenging subjects. Today, works of James
Joyce and Edgar Alan Poe are compared in terms of their lexical, colocation, colligation or adjective
usage. Holy Books are compared with one another to extract similarities. Even sentiment analyses are
done for each verse. As the user generated data (UGD) increase, NLP have started to do great jobs and
become a part of artificial intelligence already. Machines are trained to understand or perceive what
people really say taking similes, metaphors, ironies into account. Today, some machines are aware of
the difference between lexical and grammatical ambiguities and they are still learning. Suffice it to say,
these developments are the ones leading us to web 4.0 which is expected to be a semantic web era. Any
small or big contribution to NLP takes us to semantic web times inches by inches.
This book is truly one of them. It gives practical implementations of NLP through text mining. That
is done using real data corpus rather than toy data sets. Most importantly, user generated data have been
used. By analysing UGD with proper text mining models, naïve users’ opinions have been extracted,
and this is presented very well in the book. This shows that questionnaires and surveys have already
become obsolete when it comes to get user reflections. Using text not only in English but also in other
languages like Arabic and Turkish enriches the content of the book.
I congratulate editors and authors for producing such a great book. I am sure the book will be very
helpful for those who want to see real life examples of NLP implementations. This is a work which will
broaden NPL horizons of readers.

Gökhan Silahtaroğlu
Istanbul Medipol Universitesi, Turkey

xvi

Preface

The attempt of humankind to interpret signs and imaginations is almost the same age as the adventure
of life on earth. Interpreting complex signals/visions to specific meanings, survival instinct and the hu-
mankind dominance on the world depend on recognition, understanding and interpretation of the world
outside. The communication effort that started with strange sounds, and the process of self-expression
of people who continue with the cave drawings, has undergone many changes in the thousands of years
of human life adventure, with different tools and media in different forms.
The cultural and sociological changes experienced by human beings have affected self-expression,
language tools and language structure. While all of these are important in the world paradigm where
people are at the centre, there has been a transition from this paradigm to a technology-based commu-
nication paradigm over the past 50 years. Based on the new paradigm, this book is prepared as a work
fed by the common point where technology meets language.

LANGUAGE AND HUMANITY

Among the most curious topics in language studies were questions such as how the first human beings
could communicate with each other, how they reconcile in a universal language, and how they derive
words. The question of whether the paintings on the walls of the Lascaux cave, which remain alive today,
to leave a trace of themselves or as an expression of their dreams, still retains its mystery. However, as
one of the two features that distinguish human from animal, language is a unique feature, and it is the
only tool that provides communication with the environment, conveys emotions and thoughts, and im-
poses various meanings. The language acquisition that starts in the family develops and improves its use
throughout education life. Language changes with the development and progress of the human being by
adapting to the environment; Being dynamic is one of its distinguishing features. Man’s interpretation
of language has changed since the Stone Age. Today, human beings can interpret the language in the
digital world better with the help of technology.

DIGITAL WORLD

In today’s world, technology and digital tools play an essential role in human life. Today, human beings
spend some of their days in the digital world with the help of technological tools, establish their com-
munication with other people from here and can socialize in a digital “social” world. Understanding and

Preface

interpreting the prints of people and businesses in this digital world has become one of the significant
issues in today’s business. The whole set of methods specified as “natural language processing” refers
to processing the languages in which people express themselves with the help of technology and mak-
ing inferences. In this method, it is aimed to make the messages that human beings tell in various ways,
understandable with the help of machines and make interpretations on them.

OBJECTIVE OF THE BOOK

This book aims to address the current state of natural language processing, one of the most critical is-
sues in recent years, to interpret it in terms of businesses and to present examples of different languages.
The chapters of the book are grouped into three main sections accordingly—the book targets both the
academic community and practitioners as a target audience.

CONTENT OF THE STUDY

The organization of study consists of three main sections; a general outlook, business applications and
diversity for NLP.

Section 1: A General Outlook on Natural Language Processing

The first section of the project refers to a general outlook of NLP for the beginning part of the book.
This section has five chapters for evaluating NLP in general.

Chapter 1: Academy and Companies’ Needs: The Past and the Future of NLP

In the first study, the author evaluated the past and future of NLP with the focus of the academy and
business world. A brief history of NLP through the years is included in the study, and the author high-
lighted the importance of collaboration between universities and companies. Some theoretical corners
are also discussed in the study, while a view of the future of NLP is included.

Chapter 2: Exploiting Online Data Sources for the Use of NLP

The second chapter focuses on the data source side of NLP and evaluates the data sources by three groups;
online news articles, social media messages and user reviews. The author also included six main factors
for assisting businesses about the selection of data sources.

Chapter 3: Natural Language Processing in Online Reviews

Following the starting of the NLP concept with data sources concept, this study continues the subject
of online reviews in terms of NLP. The authors emphasize the increase of social media content data
volume and investigate various statistical feature selection methods in the study. In an empirical analysis
on movie review dataset, three different classifiers are compared.

xvii
Preface

Chapter 4: Sentiment Analysis as a Restricted NLP Problem

Following the data concept of NLP, this chapter focuses on the methodology of Sentiment analysis and
evaluates it as a restricted problem. Sentiment analysis is evaluated in this study in terms of types, levels,
techniques and applications. Consistent with “restricted” highlight in the title of the study, the authors
also included challenges for sentiment analysis.

Chapter 5: Deep Learning for Sentiment Analysis – An Overview and Perspectives

The final study of the first section examines sentiment analysis with deep learning concept. Selected
methods and concepts of deep learning are included in the study. At the same time, authors presented
how deep models could be used for increasing performance in sentiment analysis. The authors also
examine recent developments in the area.

Section 2: Natural Language Processing in Business

Following an overview of the NLP concept, Section 2 focuses on the business side of NLP and includes
various implementation of NLP. There are six chapters in this section.

Chapter 6: Metaphors in Business Applications – Modelling

Subjectivity Through Emotions for Metaphor Comprehension

Understanding the meanings hidden in the text is crucial for businesses, and some problems may block
understanding. This study evaluates one of these blocks -metaphors- in NLP topic and explores the
metaphor interpretation in terms of subjectivity. Modelling of subjectivity in the study includes Ekman
emotion categories and pre-trained word embeddings. The performance evaluation is implemented for
reader’s preference for emotive vs non-emotive meanings.

Chapter 7: Estimating Importance From Web Reviews

Through Textual Description and Metrics Extraction

Since the web reviews concept contains different sub-concepts inside, different approaches can be used
for evaluating web reviews in NLP methodology. This study focuses on web reviews in terms of impor-
tance and uses Fuzzy Systems and Artificial Neural Networks as methodologies. The authors conclude
approximately 85% performance for detecting most essential reviews.

Chapter 8: Discovery of Sustainable Transport Modes Underlying

in TripAdvisor Reviews With Sentiment Analysis – Transport
Domain Adaptation of Sentiment Labeled Data Set

Online reviews are rich forms with different meanings underlying them, and different approaches with
different focuses can be used for NLP based studies. This study evaluates the concept of the review
in TripAdvisor context. It evaluates the reviews for the discovery of sustainable transport modes. The

xviii
Preface

methodology of the study included both quantitative and qualitative approach based on knowledge
discovery techniques.

Chapter 9: Research Journey of Hate Content Detection From Cyberspace

As people produce content online and share the ideas online, some topics emerge by time, like hate
content. This study focuses on one of the different sides of NLP. It evaluates hate content detection
as the subject of study. Identifying the most successful techniques and methodologies for hate speech
detection is aimed in the study.

Chapter 10: The Use of Natural Language Processing

for Market Orientation on Rare Diseases

This study evaluates rare diseases as the subject and 30 rare diseases, and about 500.000 search results
are examined. It is concluded that FindZebra and Google provide good results regarding the evaluation
of therapies and diagnoses, while the quantity of findings from professional databases is still unexcelled.

Chapter 11: Quality Assurance in Computer-Assisted

Translation in Business Environment

Computer-assisted translation is the subject of this study, and the study includes three aims; presenting
a methodological framework for quality assurance analysis, comparative evaluation for four quality as-
surance tools and justifying the introduction of quality assurance in the computer-assisted translation
process. It is concluded that process is prone to errors because of the differences in error detection,
harmonization and error counting.

Section 3: Diversity Among Language Over Natural Language Processing

After the business implementations of NLP in several aspects, the final section of the book includes four
chapters. It evaluates different languages/areas for NLP.

Chapter 12: An Extensive Text Mining Study for the Turkish Language –
Author Recognition, Sentiment Analysis, and Text Classification

Employing different languages as a focus of NLP studies is essential, and Turkish is one of the most
spoken languages in the world. This study focuses on Turkish language and combines three different
problems for methodology section; author recognition, sentiment analysis and text classification. The
study uses four machine learning algorithms and two feature selection metrics for the methodology.

Chapter 13: Sentiment Analysis of Arabic Documents

– Main Challenges and Recent Advances

The study focuses on the Arabic language, which is one of the other most spoken languages in the world
and highlights the insufficiency of Arabic NLP studies. Main challenges and recent advances regarding

xix
Preface

Arabic NLP are discussed in the chapter. At the same time, the future tendency with rule-based tech-
niques and deep learning is concluded.

Chapter 14: Building Lexical Resources for Dialectical Arabic

This study also evaluates the Arabic language, but it differs from the previous study by including a
methodological approach. The study evaluates the main steps of building lexical resources for dialectical
Arabic. A collaborative web-based platform to build a cross-dialectical and general-purpose lexicon for
Arabic dialects is discussed in the chapter.

Chapter 15: A Landscape About the Advances in Natural

Language Processing in Chile and México

Natural language processing is affected by technological advances and country-based advancements.

Final study of the book includes examining the advances in NLP for Chile and México. Lack of strategic
policy which helps establishing stronger links between academia and industry and lack of inclusion of
indigenous languages are concluded in the study.

CONTRIBUTION AND CONCLUSION

The three issues related to NLP addressed in this book project are; i) General outlook of NLP, ii) NLP
and business, iii) diversity of languages for NLP. As the content and audience of each issue (addressed
as sections in this project) differ by their characteristics, the impact of the sections is different, too.
The first section aiming general outlook of NLP would be useful for starters of NLP research areas.
In contrast, the section presents an overview with different topics like academy and company needs,
online reviews, sentiment analysis. The readers can use this section to have a piece of brief information
about what NLP means for general perspective. The second section (as it is consistent to the title of the
book “NLP in Global and Local Business”) is related to the business side of NLP since one of the main
areas of this project refers to the business implementation of NLP. Various studies in the section include
different business practises related to NLP, and the readers can use each section for the purpose they
focus on. Finally, the last section of our book project refers to the diversity side of NLP as the technique
includes various languages in its scope. Evaluating the different languages and regions is crucial for
improving the usage of NLP for other languages.
This book project includes studies for both the academic community and private sector practitioners.
Readers can use this book to both have general information about NLP and to examine sectoral practices.

xx
xxi

Acknowledgment

First of all, we would like to acknowledge to our families who support us in every minute of our lives;
to whom form the basis for our projects including this study, our precious teachers and professors; to
Istanbul Medipol University and its employees for the technical equipment and infrastructure support;
additionally to Mehmet Hulusi Ekren and Enver Sait Kurtaran for their support who helped with the
promotion of the book; lastly we also thank to Zeynep Türkyılmaz and Gökhan Silahtaroğlu for their
precious contributions.

Our special thanks to our dear writers from different regions of the world (Algeria, Brazil, Chile, Croatia,
France, Germany, India, Malta, Slovenia, Spain, Turkey, UAE) 12 countries / 38 authors, who are the
real architects of this wonderful book project, by patiently fulfilling our revisions many times, justifying
our meticulous work.

Section 1
A General Outlook on Natural
Language Processing
1

Chapter 1
Academy and Company Needs:
The Past and Future of NLP

Tiago Martins da Cunha

UNILAB, Brazil

ABSTRACT
This chapter presents a view of how the use of NLP knowledge might change the relation between uni-
versities and companies. Products from NLP analysis are expected in both ends of this at times not so
reciprocal exchange. But history has shown the products developed by universities and companies are
complementary for the development of NLP. The great volume of data the world is producing is requiring
newer perspectives to provide understanding. These newer aspects found on big data may provide the
comprehension of human language categorization and therefore possibly human language acquisition.
But to process data more data need to be produced and not all companies have the time to dedicate for
this task. This chapter aims to present through sharing literature review and experience in the field that
partnerships are the most reliable resource for the cycle of knowledge production in NLP. Companies
need to be receptive of the theoretical knowledge the university may provide, and universities must turn
their theoretical knowledge for a more applied envionment.

INTRODUCTION

As a researcher in Computational Linguistics focusing on Machine Translation (MT) I had the opportunity
to work for a project from the partnership between a mobile company and a University on the creation
of a mobile personal assistant. This great interdisciplinary team had the opportunity to put some of the
theoretical architecture of my doctoral thesis on hybrid MT in practical work. The architecture was naive,
but the outcome was greater than expected.
With the compound use of statistical algorithm and the creation of rules based on the learning from
the use of the prototype application the results were almost scary. The same architecture in two different
mobile phones with two different team training the data with different context made the same system
produce two different outcomes, almost as personality. But would this be the reach of singularity in
Artificial Intelligence (AI)? Probably a lot more work would need to be done to be even talking about
DOI: 10.4018/978-1-7998-4240-8.ch001

Copyright © 2021, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Academy and Company Needs

it. But this research reached something great in a very short period of time due to its interdisciplinary
approach and the high quality of its team.
This project opportunity along with the experience of a linguistic professor made me wonder what
the future promises for NLP might. In the University, very differently from the experience with that
mobile company project, the rhythm in each resources and data is produced is very different. Although
it is those academic resources that lay the groundwork for companies’ projects like the one mentioned.
So, this got me thinking on how these different rhythms are related to the future of NLP.
Much has been said about the future of NLP. On applications, the popular interest in Bots and per-
sonal assistants bring science fiction closer to our everyday life. Personal assistants and conversational
agents designed to tutor or auxiliar tasks are provided for many everyday activities. However, the range
of understanding of popular conversationa agents and personal assistants is very limited. It is limited to
a controlled language expectation. And the specific language frame within the expect context in not a
controlled language.
The real-world discourse is very broad in meaning and context. Humans are designed or trained,
depending on your theoratical belief, to understand language. Four years of our life is spent to master
a mother language. Machines do not have this natural design. The levels of language understanding a
unliterate human have is absurdly bigger than most of the complexes AI systems. The struggle to manage
unstructured data is the real challenge in NLP researches. The less you struggle is the key to success in
such researches. You may even find satisfaction on the implementation of computer readable resources
that may provide the desired range of reachable language analysis for some NLP tools.
The volume of data is increasing everyday. Kapil, Agrawal & Khan (2016) affirm that the volume
of data increased 45% to 50% in the last two years and will grow from 0,8 ZB to 35 ZB until 2020. And
the biggest portion of them is text. So, many researchers aim their focus on techniques for analyzing this
great volume of data. Big Data, it is called. Although this focus may be more than necessary, the effort
may be given at the wrong end of the spectrum of information. Improving analysis must have reliable
computational linguistic resources to get our control data from. Although a narrowed view may be given
through analysis using probabilistic models to a variety of text, these resources may require a more sub-
jective point of view. But how can these subjective analyses be implemented into such a technical field?
Well, that’s what AI is all about. But not just a machine to machine analysis. By machine to machine,
it is related to the use of evolutionary or genetic algorithms that produce stages of analysis not readable
by humans. Understand, I’m not saying not to use statistical methods to analyze language data, but to
produce readable stages so humans can shrive through. The key may lie on building hybrid methods.
The use of statistical approaches to build rule-based engines that could be groomed by language experts.
However, the rules that have been mention here are not just syntactical or semantical, but cognitive
as well. And not separated from each other either. Syntactical and semantical theories have been broadly
used in NLP due to it extensive testing of their structured formats. The more these theories interact the
more they may showed satisfying results. The limitations of syntactic and semantic systems have al-
ready showed themselves problematic. The accuracy of such approaches has reached a limit that many
researchers have struggle to break through in broad context.
Many big corporation systems in spite their great success have reached a dead end on data analysis.
The construction of specific context framework might not always be worth the work even though it may
be the only solution for some of these big corporation problems. In the case of mobile applications, the
problem now is not the specific context but the opposite.

2

Academy and Company Needs

The possible context of a short sentence said to a personal assistant on the phone could or should be
analyzed in a wide variety of contexts. For machines a short sentence leaves too little information on a
few data to analyze on. But how do we, humans, do it? Humans are capable of evaluating the context
trhough our life experience. I’m not saying life should be given to machines, but experience. Humans sort
a new experience and compare with previous similar ones and then these are choosen. Not always the
choice correctly made, thus, making mistakes is human. But mistakes are learned from this. My humble
opinion is that much of this analysis that rely on linguistic theory must be renewed. Recent works on
Neurosciences, Psycholinguistics and Cognitive Linguistics must be incorporated in the implementation
of Corpus, Ontology and Grammar Building for instance.
The rise of these field of studies in NLP and its wide variety of implementation contexts might change
the perspective of resources created today. The investment on computing human behavior, social interac-
tions and structure possibilities need to encounter our primitive psychological/biological needs that are
represented as language. Conceptual Metaphors, one of the main issues on Cognitive Linguistics may
be one the options to drag some of syntactical/semantical analysis from stagnation. The enhancement
of Corpus data with linguistic information adopting a cognitive perspective may be a solution for many
systems, but that’s not the near future of NLP.
The future of NLP might be seen in undergraduate stages. Computing theoratical knowledge should be
basic during a university major. The universities enabling the creation of this enhanced resources will be
the next step. The so desired interdisciplinary behavior must be stimulated. The blended knowledge from
different university majors must be reinforced to navigate without the restrictions of departments. Also,
the adaptation of the linguistic curriculum is needed. To provide a more integrated view of the diverse
disciplines within linguistic knowledge, such as Phonetics, Morphology, Syntax, Semantics and Pragmat-
ics to interface more freely with one another. Computational Linguistics need to enter these curricular
components not just to provide theory or to test it but to produce data, computational linguistic data.
New approaches to part-of-speech tagging, new criteria for tagset implementation, efficient syntactic-
semantics grammar formalism engineering, specific context word framing, new declarative forms of
accessing these frames, a new theoretical approaches for corpus building (not just based on human pro-
duction but on human reasoning), conceptual ontologies creation, context specific sentiment analysis
data training are some of the few new type of resources the world of NLP stills require. Investing in
partnerships with academic group early on and creating the productive mind field in these youngsters
might be solution come up with creative solutions to our stagnated results.
The aim of this chapter is to present how NLP has been developing over the years and how it might
unfold the structure within related areas. First, the brief development of NLP through history and in
the last decade is presented. A reflection on the ups and downs of researches on NLP have had over the
years and their social and theoratical issues is shared. Then a discussion on some theoretical corners this
field of study must turn to keep on improving is made. And, finally, a view of what the future of NLP
may bring into the context of theoratical and practical interface is presented.

3

Academy and Company Needs

NLP RESEARCH CONTEXT

Brief History: Beginning With Translation

The development of NLP is not a linear frame to present. It can be represented by different branches of
studies and investigation paths. A brief illustration is presented in figure 1 in which is clarified along
this section. This figure does not illustrate intersections with linguistic, mathematical and computational
developments.

Figure 1. NLP brief history timeline

The birth of NLP in the early 50’s was within the Translation Studies. Its quest of providing MT
results “perfectly” provided the NLP’s greatest achievments and also its greatest desappointments. The
perfect quality of translation is something arguable even between human professional translators. Deter-
nining the levels of quality a translation depended on the amount of reference translation to be compared
on. Managing and pairing the amount of translation products to serve as reference has been the great
challenges in MT systems. Since the beginning the development of NLP depended on massive human
struggle, mainly to produce machine-readle material or evaluating systems.
World War II and the Cold War were largely responsible for the initial development of research in
this area (GARRÃO, 1998). Americans and English, in spying on Soviet intelligence agencies, to get
inside information as quickly as possible, developed a scientific calculator with enough data to perform
literal word-for-word translations of texts, without considering syntactic or semantic aspects (LOFF-
LER- LAURIAN, 1996). Hutchins (2003) states that machine translation projects consisted of huge
dictionaries that performed direct translation with statistical analysis. There were no linguistic theories,
at the time, to support the design of translation systems (LOFFLER-LAURIAN, 1996). Many theorists
have questioned the efficiency of information manipulation by these automatic systems.
From the beginning NLP was developed through “brute force”, as Wilks (2008) would say. The manual
labor of dicionary construction was the base of the first statistical methodologies for language process-
ing (HUTCHINS, 2003). At this time the theoratical issues of dictionary making were too young and
Lexicography and Terminology had not yet determined themselves as fields of studies. With the advent
of Artificial Intelligence, scholars designed small programs that illustrated translational possibilities.
Soon later grammar and syntax information was introduced to speed up and improve the automatic query
(GARRÃO, 1998). According to Wilks (2009), AI’s role in MT was mainly theoretical in nature. The
logic system of calculators was not sufficient to solve the logical elements in language.

4

Academy and Company Needs

Setbacks in Development

In 1966, a report by the Advisory Committee on Automatic Language Processing (ALPAC) made state-
ments discrediting the potential of MT to produce good quality translations (HUTCHINS, 2003). These
statements sowed skepticism about this field of study, leading to a general cut in government funding for
such research (SANTOS, 2012). Even without financial support, systems such as SYSTRAN and TAUM,
from the Montreal University, continued to manually perform the translations and catalog them in their
system. This example-based machine translation approach reuses existing translations as the basis for
new translations (SOMERS, 2003). This example-based approach is the backbone of one of the most
professionaly translation systems used used, Translation Memories (TM) system.
TM systems are not MT systems, but Computer Assisted Tools. All the translations are suggestions
for the user to choose from as the translation is manually edited. Even though TMs are not MTs, they
share similar architectures for the user suggestions. Groves (2005a) discusses several hybrid approaches
on data-focused MT systems such as statistical example-based. These authors investigate the different
ways of linking syntactic knowledge with statistical data. Using TM system require the user to feed,
initially, the system with translations. This pairs the translation more reliably. Other MT system do the
pairing without supervision. But MT products must go through a evaluation process to rank their ef-
ficiency. And that requires human professionals to evaluate the quality of the translations automatically
generated by the MT system. MT as any NLP system evaluation require human attention.
New devolepment in NLP still require great amount of human collaboration, and this can be aquired
through academic involvement or entreprise financing. Back in the 70’s, the lack of good translation
ratings from MT systems resulted in the skepticism and, subsequentially, in the Entreprise fleeing aca-
demic partnerships. Without external financing most researches were carried in the academic limits, as
TAUM from Montreal University is cited. Later, due to linguistic and mathematical theory development
the good results started to come and soon after the entreprise interest.
Lexicography is a good example of theoratical development that made a practical real-world differ-
ence. Lexicography is the discipline of describing, analyzing and responsible for craft of compiling,
writin and editing dictionaries according to semantic, sintactic and pragmatic relationships (BÉJOINT,
2000). Due to technological issues, lexicographers rarely shared their views with other colegues in the
60’s (ZGUSTA, 2010).

New Theories to Approach Data

However, with the globalization scenario this discipline was one to manage to organize its principles
and views. Lexicography was one of the first disciplines to provide data and to manage to organize them
according a theoratical criteria. Following the organization provided by Lexicography but with a longer
historic gap between the arise Corpus Linguistics was one to boost NLP theories and analysis. Corpus
Linguistics is so powerfull IBM announced, back in the 50’s at the very beginning of Corpus Linguistics
advances, that a supercomputer running the Brown Corpus would soon master the English language.
This was shortly corrected as an academic prank (MASTERMAN & KAY, 1959). Since then, much of
the process has evolved. First from paper punch card to hard drive memories. But the philosophy behind
Corpus Linguistics has not changed much.
According to Sardinha (2004), corpus linguistics is the science that deals with the collection and
exploration of corpora, or carefully collected textual linguistic data sets, for the purpose of serving to

5

Academy and Company Needs

searching for a language or linguistic variety. According to Oliveira (2009), this area represents a new
philosophical approach to language studies and can be considered as the “modern face” of empirical
linguistics. In corpus linguistics the process of text selection to compose the corpus and its arrange-
ment in the database is very important. There are several ways to compose a corpus. The corpus can
be composed of texts of one language or several languages, in this case called the multilingual corpus.
The multilingual corpus can be aligned in parallel. According to Simões (2004), the parallel corpus can
be aligned at different levels, e.g. in paragraphs, sentences, segments, words and even characters. The
parallel corpus can be used for different purposes, such as foreign language teaching, terminology stud-
ies, information retrieval systems and MT.
Brown Corpus is one of the most popular corpora in the world. Its format has served as a template for
new corpora since its construction. It is a million-word collection from 500 hundred written texts from
different genres (JURAFSKY and MARTIN, 2009). The Brown Corpus had initially only the words
themselves. After several years Part-of-Speech (POS) was applied (GREENE and RUBIN, 1971). The
POS tagging process had been extensively revised. Now the Brown Corpus has been serving as a model
for the design of new corpora not just on its format but on it linguistic resources. Much linguistic enrich-
ment work has been extensively done. POS tagging has added to each word a morphossintactial label in
which the tagset has been replicated in many different corpora in many different languages.
The annotation of POS is popularly done by the use of probabilistic models. Today the state of art in
accuracy of probabilistic POS tagging is over 96%. Thus, the linguistic enrichment through POS tagging
require a carefull revision to turn the annotated corpus into a gold-standard model. Once in a gold-standard
stage, the corpus can be used as model for automatically assigning POS tags to new texts or corpora.
Although POS tagging can be done in a variety of ways, statistical models are the most commonly used.
Statistical implementation of algorithms has followed NLP since its beginning with MT calculators.
Probabilistic models are crucial for capturing every kind of linguistic Knowledge (JURAFSKY and
MARTIN, 2009). Hidden Markov Models are one of the most used in the field and and the advantage
of probabilistic approaches is the robustness for solving any kind of ambiguity and providing a possible
answer. Remember that a possible answer not necessarily means a reliable one.
As statistical POS tagging, once again, attention must be called that this is not the only way to achieve
a linguistic enrichment of a corpus. There are State Machines and Rule-based system are also capable
of annotating new texts or different corpora, but their implementations are not as popular as the statisti-
cal ones and the construction and rule implementation is a extensive and expert tiring work, but with
products up to 100% reliable.
POS tagging is one of the processes in the pipeline of NLP to enrich linguistically a corpus. Other
stages can be the syntactical annotation, thematic roles and lexical semantic annotation. The syntactical
information is commonly stored as treebanks. The Penn Treebank is a highly enriched corpus and it
gathers not only syntactical information but sense annotation as well (PRASAD et al., 2008).
The interface of Syntax and Semantics in a corpus helps all sort of parallel studies. A syntactically
annotated corpus can serve as a sample for grammar extraction, which may serve as grammar correc-
tor or speech identifier. Linguistic information in different levels of depth can be used as resource of a
number of intelligent systems. Other systems that rely on decision making and through the use of dif-
ferent models imply intelligent machines are Conversational Agents or Dialogue systems, such as chat-
terbots. Chatterbots are systems that seek to interact and maintain human communication. Alan Turing
in his famous paper “Can a machine think” proposes a philosophical task of daring a human to maintan
a conversation with an unknown subject and then guess if the interaction happened with a human or

6
Another Random Scribd Document
with Unrelated Content
'Alya, Djebel el 49, 50, 52, 60.

Amanus, Berg 324.

Amerikanische Expedition 72, 162, 266, 286;

Universität in Beirut 201, 222.

'Ammān 13, 34, 54, 157;

Torweg A. 27;
Theat. A. 25.

'Anazeh, Stamm 23, 24, 61, 112, 122, 147, 166, 190.

'Antara, Gedichte des 57, 63.

Antilibanon 104, 116, 153, 158, 162.

Antiochien 169, 309, 310, 312, 323, 324, A. 306, A. 307;

Beschreibung der Überreste 312 f.;
Sarkophag in der Serāya A. 327;
Getreidemarkt A. 311;
Haupt einer Sphinx A. 314.

Apamea 232, 234, 315.

Apostelbrunnen 7.

Araber 14, 16, 22, 35, 54, 57, 90, 99, 103, 119;
Dichtung 57-59;
Feindschaft zwischen den Stämmen 61, 62;
Gastfreundschaft 31 f., 36, 52-54;
Sitten 35, 36, 39, 46, 64;
Marduf reitend A. 61.

Arabische Bauern A. 203;

Inschriften 117.
'Areh, Dorf 77, 81.

Armenier 135, 316, 322.

Armenische Frage, die 316, 317, 322.

Asad Beg 165.

Asbā'i, Mustafa el 143.

'Asī-Sumpf 234.

Assassinen, Sekte 189.

Athen, Akropolis 158.

At Tabari, Geschichte des 75.

'Awād, Araber 116, 117, 118, 122.

'Awais, Jūsef el 154.

'Azam Zadēh, Familie zu Hamāh 215, 216.

Azrak, Kal'at el 80.

Ba'albek 153, 154, 158, 165, 209;

Tempel der Sonne 158; A. 159;
der große Hof A. 161;
Säulen des Sonnentempels A. 163;
Brunnen im großen Hof A. 173;
Fragment eines Gebälkes A. 174;
Basilika des Konstantin A. 175;
Steinlager A. 177;
Rās ul 'Ain A. 179;
Tempel des Jupiter A. 167;
Kapitäle A. 171.

Bāb el Hawah 286; A. 289.

Bābiska, Dorf 289.

Babylon 233.

Bagdad 61, 103;

Eisenbahn 252.

Baghrās, Kastell 324.

Bailānpaß 324.

Baitokaikē 209.

Bākirha, Ruinen 268, 289;

Tempelruine A. 291.

Balad, Scheich el 55.

Baldachingrab in Barād 280; A. 279.

Barād 276;
Turm im Westen der Stadt A. 277;
Baldachingrab A. 279.

Barada, Suk Wādi A. 155.

Barada, Wādi 144, 153.

Barāzi, Familie zu Hamāh 215.

Barāzi, Mustafa Pascha el 142, 227.

Bārischa, Djebel 286, 289, 290, 294.

Baschan 84.

Bāsufān, Dorf 281, 282, 284, 286;

kurdisches Mädchen A. 285.

Bathaniyyeh 126.

Bauern, syrische 54.

Bawābet Ullah, Damaskus 128.

Bedr, Schlacht von 59.

Beduinen 10, 22, 53, 54, 198, 249.

Beha'i, Sekte der 144, 184.

Beida, Chirbet el 118.

Beida, Kal'at el 32, 119, A. 119, A. 120;

Gefängnistür A. 121;
Simse A. 123.

Beida, weißes Land 103, 116.

Beirut 95, 201, 222, 255.

Belkaaraber 23, 25, 54; A. 32, A. 47.

Belkaebene 18, 22.

Beni Atijjeh 231.

Beni Awadjeh, Araberst. 61.

Beni Hassan, Stamm 61, 64, 92.

Beni Sachr, Stamm 23, 32, 35, 37, 39, 84.

Beni Scha'alān 24.

Bergaraber 71.

Bienenkorbdorf 250, A. 251.

Biridjik, Eisenbahn in 252.

Birket Umm el 'Amūd 24.

Bizzos, Grabmal 244, 245, 267, 268, A. 244.

Bkei'a, Ebene 191.

Blunts, Reisende 80.

Bologna 245.

Bosra 20, 69, 77.

Bosra el Harīr 221.

Bosra eski Scham A. 93.

Brāk, Dorf 126.

Brünnow 32 Anm.

Buchalih 119.

Burdj el Kās 272.

Burdjkeh, Dorf 270, 271.

Burdj Heida 282.

Burenkrieg 220.

Busān, Wādi 102.

Butler, Mr. 71, 235, 266, 266 Anm., 267 Anm., 272 Anm., 275 Anm.

Calycadnus, Fluß 232.

Cassius, Berg 317, 323.

Chabbaz, Hanna 187.

Chālid Beg 'Azam, Haus das, in Hamāh 219.

Chamberlain, Mr. 100.

Charāneh, Ruinen von 52.

Chaulik 317.

Chirāb esch Schems 272, 275, A. 273;

Skulpturen im Innern eines Grabes A. 275.

Chirbeh 119.

Chirbet Hāß, Dorf 236;

Oberschwelle A. 239.

Chittāb 108, 112, 126.

Christusdorn 11.

Chudr, Gefangener 188.

Chureibet es Suk, Tempel und Mausoleum 26 bis 29;
Tempel A. 29;
Mausoleum A. 31.

Cromer, Lord 55, 100, 221.

Cufische Inschriften 76, 117.

Da'dja, Stamm 22, 25, 38, 50, 61, 65, 92.

Damaskus 73, 77, 83, 92, 95, 100, 126, 127, 128, 129, 130, 133;
Freitag in 147;
Große Moschee 136, 144, 145;
Große Moschee, Hof der A. 147;
und Dächer vom Fort aus A. 131;
Kornmarkt A. 135;
vor den Toren von A. 151;
Wasserverkäufer A. 152;
Verkäufer von Zuckerwaren A. 145.

Dāna, Dorf 286;

Grab zu A. 287;
Pyramidengrab in 245;
Grabmal A. 249.

Danādischeh, Familie 200.

Daphne, der Weg nach 314, A. 313.

Dār Kita 275.

Decimus, Centurio der Legion des Flavian 209.

Dehes 286, 293.

Deir es Sleb 212.

Deiret Azzeh 286.

Deir Sambil 241.

Dera'a, Höhlendorf 104.

Derwisch, Soldat 162, 165.

Deutschland, Bagdadbahn 252.

Dīn, Scheich ed 55.

Djad'allah 65, 80.

Djebeliyyeh 71, 92.

Djerūd, Oase von 147, 148.

Djerūdi, der Brigant, s. Mohammed Pascha, Scheich von Djerūd.

Djisr el Wād, Brücke 202.

Djof 80.

Domaszewski 32 Anm.

Drekisch, Dorf 204.

Dreschplatz in Karyatein A. 148.

Drusāra 91.

Drusen 37, 41, 49, 55, 63, 66, 72, 74, 82, 87, 98, 99, 118, 298, 300;
Sitten 124;
Streitigkeiten mit den Suchūr 83-101;
eine Gruppe A. 83.
Drusische Pflüger A. 91.

Druz, Djebel 60, 65, 66, 74, 75, 81, 90, 104, 111, 118, 157, 158.

Drusisches Gebirge 20, 41, 60, 65, 74, 90, 96.

Dussaud, Mr. 71, 80, 98, 117, 120, 169.

Edsch Dscheida 104.

Effendi, Derwisch, Afghane 219.

Effendi, Jusef 81, 82.

Effendim 212.

Eisenbahn nach Mekka 13, 165;

Rayak-Hamah 252;
Bagdad 252;
französische 214, 252.

Eisernes Tor, Antiochien 313.

El, Gott 118, 119.

El 'Ablā 116, 117.

El Adjlād 104.

El Bārah, Dorf 236, 237, 238, 264, 300;

ein Haus in A. 237;
Fries A. 239.

El Chudr, Grab von 89, 91.

El Churbeh, Türbalken A. 101.

El Hayyāt, Moschee in Hamāh 222.

El Mugharāh, Dorf 246.

El Muwaggar 50, 121;

Kapitäl A. 51, 52, 53.

Emesa, Römerstadt 181.

Englisch-japanisches Bündnis 221.

Epiphania, Festung 213.

Ethreh 80.

Euphrat 259.

Euting, Reisender 80.

Fāfertīn, Dorf 263, 271.

Fāiz, Neffe des Mohammed en Nassār 102, 104, 125, 126.

Faīz el Atrasch, Scheich von Kreyeh 73, 77.

Faiz', Talāl ul 23, 24.

Fāris, Maultiertreiber 260, 261, 262, 293, 300.

Fāris, Habīb 18, 21, 324.

Fayyād Agha von Karyatein 147.

Fedhāmeh 104.
Feiertag, ein, im Orient A. 185.

Fellahīn-Bank 55.

Fendi, Führer 73.

Fīda Abu'l 21.

Frankreich, Bagdadbahn 252.

Gablān, Araber 38, 42, 47, 48, 49, 50, 52, 53, 56, 57, 60, 61, 63, 64,
66, 69, 70, 324, A. 57.

Garīz, der, Seleucia 321, 322, A. 319;

unterer Teil des A. 325.

Gethsemane 4.

Gharz, Ghādir el 116, 118.

Gharz, Wādi el 119.

Ghassaniden Forts 32, 50, 121.

Ghawārny 40.

Ghazu 63, 76.

Ghiāth 92, 104, 107, 112, 115, 125.

Ghor, das 10, 16, 40;

Zug durch das, A. 12.

Giour Dāgh 280, 290.

Gischgāsch, Scheich von Umm Ruweik, 102, 104, 106, 107, 112,
115.

Gottesherz 111.

Grabeskirche, heilige, in Jerusalem A. 2.

Griechen 134, 135.

Griechische Inschriften 117, 233, 244, 245, 271, 276.

Habīb, Maultiertreiber 3, 14, 69, 107, 119, 162, 166, 260.

Habrān, Torweg A. 97;

kurdisches Mak'ad A. 99.

Haddjbahn 21, 33, 42.

Haddjstraße 231, A. 58.

Hadūdmadūd 283.

Haida, Dr. 165.

Haifa 18.

Haīl, Stadt 42, 46, 80.

Halakah, Djebel 286.

Hamad 103, 107, 112, 119.

Hamāh 162, 166;

Beschreibung 213-215;
Bewohner 215-223;
Römerstraße 211, 212;
Moschee 215;
Kubbeh A. 215;
Kapitäl A. 221, A. 223, A. 233;
Na'oura A. 213;
Tekyah Killānijjeh 219, A. 217.

Hamath, Festung 213.

Hamdān, Sohn der Weisheit 115.

Hamūd, Gablāns Vater 49.

Hamūd von Sueda 77, 89, 90.

Hanelos 119.

Hārim 298, 300;

Burg 300, A. 299.

Hārith, Ibn el 59.

Harra, schwarzes Land 103.

Harūn er Raschid 205.

Haseneh 66, 166, 190;

Kamele der A. 67.

Hāß, Djebel el 250.

Hassan Beg Rā'i 178.

Hassaniyyeh, Stamm 22, 61, 66, 84.

Haurān, Gebirge 17, 55, 66, 71, 72, 78, 80, 82, 103, 120, 121, 126,
298.
Hayat, Kalybeh 126;
Haus des Scheich A. 127.

Heddjasbahn 133.

Helbān, Dorf 250.

Hermon 116, 153.

Heschbān 16.

Hind, das Land 189.

Hiran 121.

Hīt, Dorf 126.

Hittiter 166, 169, 170, 214.

Hober, Dorf 250.

Höhlen Namrūds 28-33.

Höhlendörfer 104.

Homs 104, 162, 169, 170, 173, 174;

die Einwohner 173, 174, 182-187;
Häuser 178, 181;
der Orontesanger
Mardj ul 'Asi 181;
Kastell 177;
ein Feiertag im Orient A. 185;
Straße in A. 187.

Homs, See 169.

Homsi, Nicola 257.

Howeitāt, Araberstamm 61, 231.

Hurmul, Turm von 165, 166.

Husn es Suleimān 206;

Tempel A. 207;
Tempel, Nordtor A. 209.

Husn, Kal'at el 188, 192, 195, 197, 200;

griechisches Kloster 202; A. 193;
Inneres der Festung A. 195;
innerer Festungsgürtel A. 199;
Bankettsaal 198, A. 198;
der Schwarze Turm 192.

Ibrahim, Armenier, 317, 318, 321.

Ibrahim, Maultiertreiber 3.

Ibrahim Pascha 34, 173.

Iliān, Milhēm 81, 82, 91, 92.

Imtain 65, 77.

'Isa, Fellāh ul 49, 50, 53, 54, 55, 60, 65, 72, 80, 158, A. 49.

Islam 220, 221.

Ismailiten 188, 211, 212, 225.

'Isset Pascha 144, 205.

Jadūdeh, Felsengräber 24.

Jaffa 7.

Jahya Beg el Atrasch 77, 298.

Jakit Ades 262.

Japanische Krieg 98 bis 101, 150, 178.

Jemen, Aufstand 13, 14, 78, 121, 221, 231, 255, 256.

Jericho 10.

Jerusalem 4, 95, 154, 256;

Klagemauer in A. 17;
Moschee Omar A. 1;
heilige Grabeskirche A. 2;
Straße in A. 3;
Stephanstor in A. 4.

Jezīdi, Sekte 272;

Glaube der 268, 269, 282, 283, 284.

Jordan, Tal, das 10, 22.

Jordanbrücke 12, 13, 14, A. 13.

Judäa, Wüste von 9.

Juden aus Buchara A. 18.

Jūnis, Scheich von El Bārah 238, 241, 242, 243, 246, A. 242, 327.

Jusef, Führer 22, 24, 96, 98, 101.

Kabul 219.
Kabuseh 322.

Kadesch 169, 170.

Kāf 80.

Kaffee, Gebräuche 19, 20;

am Wegrande A. 191.

Kais, Imr ul 47, 56, 58, 63.

Kalam, Muschkin 143, A. 143.

Kalb Lōzeh, Kirche von 293, 297, 299, A. 295.

Kalkutta 219.

Kalōteh, Dorf 272, 275;

Kirche 276;
Kapitäl A. 276.

Kamele, Tränken der A. 71.

Kāmu'a Hurmul 165, A. 183.

Kanawāt 104, 158;

Basilika A. 105;
Tempel A. 107;
Tor der Basilika A. 109;
Mauern von A. 103.

Kantarah 112.

Karyatein, Oase von 147;

Dreschplatz in A. 148.
Kasr el 'Alya 50.

Kasr el Banāt 246, A. 247.

Kastal 32, 121.

Kāturā, Grabmal A. 272.

Kbēs, Monsieur 214, 215, 216, 222, 223.

Kbeschīn, Dorf 263.

Kefr 'Abīd, Dorf 250, 251.

Kefr Anbīl 235, 236.

Kefr Lāb 282.

Kefr Nebu 280.

Keifār 280.

Kerak 198, 199.

Khayyām, Omar 22.

Kiāzim Pascha, Vāli von Aleppo 255-259.

Kieperts Karte 162, 250, 263.

Killani, Familie zu Hamāh 215, 219, 227.

Killiz 252.

Klagemauer in Jerusalem A. 17.

Konia 162, 260, 261.

Konstantin, Münzen 26.

Konstantinopel 46, 99, 144, 166, 205.

Koran, Erzählungen vom 225, 226.

Kreta, Muselmänner von 146.

Kreuzfahrer 199, 202.

Kreyeh 74, 77, A. 89.

Ksedjba, Dorf 286.

Kseir 166, 169.

Kubbeh in der Moschee zu Hamāh A. 215.

Kubbet el Chazneh 136, 143, A. 137.

Kuda'a, Stamm 134.

Kuleib 79.

Kulthum, Ibn, Gedicht des 134.

Kurden 99, 263, 264, 281, 285.

Kurutul, Kloster oberhalb Jerichos A. 11.

Kurunfuleh 154, 157.

Kuseir es Sahl 26.

Kutaila, Klagegesang von 59.

Kuwēk, Fluß 250.

Kweit 46, 256.

Kymet, eine kurdische Frau 322, 323.

Lager in der Nähe des Toten Meeres A. 23;

Abbrechen des A. 73.

Lahiteh 126.

Lampe in Rifa't Aghas Sammlung A. 313.

Laodicea ad Orontem 169, 170.

Larissa, Stadt 227.

Lava 116, 119, 122.

Lebīd, Gedichte des 57, 58.

Lebweh 162, 165.

Ledschastraße 126.

Libanon 157, 158, 162, 169;

Zedern des A. 182.

Littmann, Dr. 71, 73, 117 Anm.

Lütticke, deutscher Konsul in Damaskus 129.

Lysicrates, Denkmal des 286.

Ma'alūla, Kloster von 202.

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.