Introduction to Big Data complete course is currently being offered by UC San Diego through Coursera platform.
Learning Outcomes for Introduction to Big Data Course!
At the end of this course, you will be able to:
* Describe the Big Data landscape including examples of real world big data problems including the three key sources of Big Data: people, organizations, and sensors.
* Explain the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) and why each impacts data collection, monitoring, storage, analysis and reporting.
* Get value out of Big Data by using a 5-step process to structure your analysis.
* Identify what are and what are not big data problems and be able to recast big data problems as data science questions.
* Provide an explanation of the architectural components and programming models used for scalable big data analysis.
* Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.
Instructors for Introduction to Big Data Course!
- Ilkay Altintas
- Amarnath Gupta
Skills You Will Gain
- Big Data
- Apache Hadoop
- Mapreduce
- Cloudera
Also Check: How to Apply for Coursera Financial Aid

Intro to MapReduce
- Hardware Only
- Software On-Demand
- Computing Environment
- Hardware Only
- Computing Environment
- Software On-Demand
- Hardware Only
- Computing Environment
- Software On-Demand
- NameNode for block storage and Data Node for metadata.
- NameNode for metadata and DataNode for block storage.
- FASTA for genome sequence and Rasters for geospatial data.
- For gene sequencing calculations.
- Coordinate operations and assigns tasks to Data Nodes
- Listens from DataNode for block creation, deletion, and replication.
- Map -> Shuffle and Sort -> Reduce
- Shuffle and Sort -> Map -> Reduce
- Map -> Reduce -> Shuffle and Sort
- Shuffle and Sort -> Reduce -> Map
- Guaranteed hardware support.
- Less software choices to choose from.
- Quick prototyping, deploying, and validating of projects.
- Quick prototyping, deploying, and guaranteed bug free.
- Giraph, for SQL-like queries.
- Zookeeper, analyze social graphs.
- Pig, for real-time and in-memory processing of big data.
- Zookeeper, management system for animal named related components
- Low level deals with storage and scheduling while high level deals with interactivity.
- Low level deals with interactivity while high level deals with storage and scheduling.
- Random Data Access
- Data Level Parallelism
- Task Level Parallelism
- Advanced Alogrithms
- Infrastructure Replacement
- Enable Scalability
- Handle Fault Tolerance
- Provide Value for Data
- Latency Sensitive Tasks
- Facilitate a Shared Environment
- Optimized for a Variety of Data Types
- Implementation of Map Reduce.
- Enables large scale data across clusters.
- Allows various applications to run on the same Hadoop cluster.
- Node Manager and Container
- Resource Manager and Container
- Applications Master and Container
- Node Manager and Applications Master
- Resource Manager and Node Manager
Post a Comment