IGNOU MCS-226 (January 2024 - July 2024) Assignment Questions
Q1: What is Exploratory Data Analysis (EDA) and why is it important in the data science workflow? What are the key components of the data science process?
Q2: Discuss the implications of hypothesis testing results in decision-making. Provide examples of realworld situations where statistical hypothesis testing is commonly used.
Q3: What is data preprocessing, and why is it a crucial step in the data science workflow? Why is it important to identify and handle outliers in a dataset during data preprocessing?
Q4: Discuss the significance of the three Vs (Volume, Velocity, Variety) in the context of big data. Provide examples of each of the three Vs in real-world scenarios. How does MapReduce facilitate parallel processing of large datasets? Explain the functionality of the Map function in the MapReduce paradigm with the help of an example.
Q5: Explain the purpose of Apache Hive in the Hadoop ecosystem. How does Spark address limitations of the traditional MapReduce model?
Q6: Define NoSQL databases and explain the primary motivations behind their development. Provide examples of scenarios where each type of NoSQL database is suitable.
Q7: How does collaborative filtering contribute to enhancing user experience and engagement in recommendation systems? Provide examples of industries or platforms where collaborative filtering is widely used.
Q8: What is a Data Stream Bloom Filter? Explain its primary purpose in data stream processing. Also, introduce the Flajolet-Martin Algorithm and its role in estimating the cardinality of a data stream.
Q9: Describe the role of link analysis in the PageRank algorithm. How are links between web pages interpreted in the context of PageRank?
Q10: Explain the concept of decision trees in classification. Provide an example of building and visualizing a decision tree using R. How can K-means clustering be applied to a dataset in R?
IGNOU MCS-226 (January 2023 - July 2023) Assignment Questions
Q1: Describe data science. What uses does it have? In the context of data analysis, define the terms descriptive, exploratory, and predictive.
Q2: Discuss the need for Statistical Hypothesis Testing with the help of an example. Explain types of Errors in Hypothesis Testing.
Q3: Why do need Data Preprocessing? Explain different Quality Measures in Data Preprocessing. Discuss the different strategies for Data Handling.
Q4: A class has 25 students. Create a data set of marks of the students in Mathematics out of a maximum of 50 marks. Discuss and draw, which chart will be best for Visualization & Interpretation. Justify your reasons in support of your answer.
Q5: What is the need for Big data? Explain 3 V’s. Discuss the master/slave Hadoop architecture with the help of an example.
Q6: Explain the concept of Map-Reduce with the help of an example.
Q7: What is the purpose of using Apache SPARK, HIVE and HBASE, explain with supporting example.
Q8: What is NoSQL database? Discuss how does a Column Database and Document database Work? List and briefly discuss Graph database examples.
Q9: Explain the Jaccard similarity of sets with the help of an example. What are the ways offinding similarity between two documents? Also, define the term collaborative filtering.
Q10: Explain Data Stream Bloom filter with the help of an example. Why do we need for Bloom filter? Discuss the working of Bloom filter. Explain the Flajolet-Martin algorithm.
Q11: What is PageRank? Discuss the basic principle of flow model in PageRank. Explain different mechanisms of finding pagerank?
Q12: Explain the process and issues of the following: Advertising on web, Recommendation system, Mining of social networks.
Q13: Discuss different data structures in R. Write program using R for the following tasks:
(i) Computation of income tax of a vector of size 10, consisting of the total annual income of 10 different persons. The tax computation should be 10%, if annual income is below 5 lakhs and 20% if it is above 5 lakhs.
(ii) Matrix addition, subtraction and multiplication
(iii) Finding inverse of a matrix
Q14: Create a sample data of the marks of 20 students in five different subjects using MSExcel. Discuss the different chart and graphing library packages supported by R programming language. Write programs using R programming language to create four different plots using this data.
Q15: Discuss the function supported in R language to differentiate between linear regression and multiple regression. Write programs using R programming language to support your answer with any sample data.
Q16: Discuss the Classification, Clustering and Association Rules with different examples. Explain, where we can use Random Forest Algorithm? Use R programming language to discuss Random Forest Algorithm.
Buy MCS-226 Assignment