Introduction to Big Data. Overview for the Apache Hadoop ecosystem with examples of storing, processing and analyzing large volumes of data
- Data Science, Big Data
- Accepted
November 14, 12:00
Room V|V зал
Add to gCal Add to iCal/Outlook
Introduction to Big Data. Overview of data sources. Approaches to data integration. Data quality problem. Types of processing and scaling. Organizational and methodological problems of creating a corporate data warehouse. The concept and architecture of Apache Hadoop. Hadoop cluster management. Mastering the main components of Hadoop: Cloudera Manager, HDFS, YARN, Oozie, HUE, Pig, HCatalog, Hive, Impala. Creating and launching analytical applications using the MapReduce and Spark Frameworks taking IDE Eclipse, pyspark, spark-shell, Cloudera Workbench Manager and programming languages Java, Python, Scala.
Mikhail Belov
PhD, Dubna State University
Scientific Director of the master’s degree program «Business Analytics and Big Data Systems», associate professor
Mikhail Belov is an expert and empirical scientist in the field of information technology, he leads and successfully performs scientific guidance of the Master’s degree program «Business Analytics and Big Data Systems» whose graduates are highly demanded not only in high tech industry, but also in leading research organizations including the Joint Institute for Nuclear Research (JINR), the European Organization for Nuclear Research (CERN) and etc. For over 18 years, he has been teaching classes on the master’s level and MBA at Dubna State University, HSE, MEI; under his leadership, written over 200 bachelor and master’s works. As director of telecommunication center (CTO), he developed the IT infrastructure of Plekhanov Russian University of Economics. He was the first ever in Russia to create and implement a virtual computer lab based on the principles of entropy and self-organization. Played a leading role in the formation and development of a scientific school for the practical training of IT professionals while enabling remote development and adoption of multicomponent information systems using cloud computing technologies.
Founder and Developer of Dictutor. It is a project, whose purpose is to improve foreign language acquisition and partially reduce the level of digital inequality in educational technologies in more than 100 countries around the world.