Quantitative Analytics

Core Concepts & Frameworks for Senior Associate Role

Role Overview

This position is within the CCB Data & Analytics organization, focusing on enabling data-driven decision-making through insights, recommendations, and strategic analysis.

Key Focus Areas:
• Reducing operating expenses through ML/AI solutions
• Improving customer and agent experience
• Data-informed product decisions
• Cross-functional collaboration (Engineering, Design, Product)
Exploratory Data Analysis
ML/AI Operations
Product Analytics
Agile Methodology
Stakeholder Management
Data Storytelling

1. Data Analysis & Statistics

Exploratory Data Analysis (EDA)

  • Univariate Analysis: Distributions, histograms, box plots, summary statistics
  • Bivariate Analysis: Scatter plots, correlation matrices, cross-tabulation
  • Multivariate Analysis: Multiple regression, MANOVA, factor analysis, canonical correlation
  • Dimensionality Reduction: PCA, t-SNE, UMAP (for visualization and feature reduction)
  • Data Quality Checks: Missing values, outliers, duplicates, data types
  • Feature Engineering: Creating derived features, binning, encoding categorical variables

Statistical Concepts

  • Descriptive Statistics: Mean, median, mode, standard deviation, percentiles
  • Probability Distributions: Normal, binomial, Poisson, exponential
  • Hypothesis Testing: t-tests, chi-square, ANOVA, p-values, significance levels
  • Confidence Intervals: Understanding margin of error, sample size calculations
  • Correlation vs Causation: Pearson, Spearman, confounding variables
  • Statistical Power: Type I/II errors, power analysis
  • A/B Testing: Experimental design, control groups, statistical significance

Key Statistical Formulas

  • T-test: t = (x̄₁ - x̄₂) / √(s²(1/n₁ + 1/n₂))
  • Chi-square: χ² = Σ(O - E)² / E
  • Pearson correlation: r = Σ((xᵢ - x̄)(yᵢ - ȳ)) / √(Σ(xᵢ - x̄)² Σ(yᵢ - ȳ)²)
  • Standard deviation: σ = √(Σ(xᵢ - μ)² / N)
  • Confidence Interval: CI = x̄ ± (t* × SE)

Data Mining Techniques

  • Pattern Recognition: Association rules (Apriori, FP-Growth)
  • Anomaly Detection: Isolation Forest, statistical methods, clustering-based
  • Time Series Analysis: Trends, seasonality, ARIMA, forecasting
  • Segmentation: Customer segmentation, RFM analysis, cohort analysis

2. Programming & Tools

SQL (Essential)

Advanced SQL Topics

  • Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), running totals
  • CTEs & Subqueries: Common Table Expressions, nested queries, correlated subqueries
  • Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS, self-joins
  • Aggregations: GROUP BY, HAVING, ROLLUP, CUBE, grouping sets
  • Performance Optimization: Indexing strategies, query execution plans, partitioning
  • Data Manipulation: CASE statements, COALESCE, NULLIF, string functions
  • Date/Time Functions: SQL Server (DATEADD, DATEDIFF), PostgreSQL (INTERVAL, AGE, EXTRACT), MySQL (DATE_ADD, DATE_SUB), Standard SQL (EXTRACT)

Python

Core Python Libraries

  • Pandas: DataFrames, groupby, merge/join, pivot tables, time series, data cleaning
  • NumPy: Arrays, vectorization, broadcasting, linear algebra operations
  • Matplotlib/Seaborn: Data visualization, statistical plots, customization
  • Scikit-learn: Preprocessing, model selection, evaluation metrics, pipelines
  • SciPy: Statistical tests, optimization, interpolation
  • Automation: File I/O, API calls, scheduling jobs, error handling
# Example: Data transformation automation import pandas as pd import numpy as np # Load and clean data df = pd.read_csv('operations_data.csv') df['date'] = pd.to_datetime(df['date']) # Feature engineering df['hour'] = df['date'].dt.hour df['day_of_week'] = df['date'].dt.dayofweek # Aggregation with window functions df['rolling_avg'] = df.groupby('agent_id')['handle_time'].transform( lambda x: x.rolling(window=7).mean() ) # Export results df.to_csv('processed_data.csv', index=False)

Data Analytics Tools

Tableau

Dashboards, LOD calculations, parameters, actions, filters, data blending

Alteryx

ETL workflows, data preparation, predictive tools, spatial analytics

Snowflake

Cloud data warehouse, semi-structured data, data sharing, time travel

SAS

PROC SQL, data steps, statistical procedures, macros

R

dplyr, ggplot2, tidyr, statistical modeling, Shiny dashboards

Git

Version control, branching, pull requests, code collaboration

3. Machine Learning

Supervised Learning

  • Regression: Linear, Ridge, Lasso, ElasticNet, Polynomial
  • Classification: Logistic Regression, Decision Trees, Random Forest, Gradient Boosting (XGBoost, LightGBM)
  • Support Vector Machines: Linear and non-linear kernels
  • Neural Networks: Basic architectures, backpropagation
  • Ensemble Methods: Bagging, boosting, stacking

Unsupervised Learning

  • Clustering: K-means, DBSCAN, Hierarchical, Gaussian Mixture Models
  • Dimensionality Reduction: PCA, t-SNE, UMAP, autoencoders
  • Association Rules: Market basket analysis, recommendation systems

Model Development & Evaluation

  • Feature Engineering: Selection methods (RFE, LASSO), importance ranking, interaction terms
  • Train/Validation/Test Split: Cross-validation, stratification, time-based splits
  • Metrics:
    • Classification: Accuracy, Precision, Recall, F1, ROC-AUC, Confusion Matrix
    • Regression: RMSE, MAE, R², MAPE
  • Key ML Formulas:
    • Precision: P = TP / (TP + FP)
    • Recall: R = TP / (TP + FN)
    • F1 Score: F1 = 2 × (P × R) / (P + R)
    • RMSE: √(Σ(yᵢ - ŷᵢ)² / n)
    • MAE: Σ|yᵢ - ŷᵢ| / n
    • R²: 1 - (SS_res / SS_tot)
  • Hyperparameter Tuning: Grid search, random search, Bayesian optimization
  • Overfitting Prevention: Regularization, early stopping, dropout
  • Model Interpretability: SHAP, LIME, feature importance, partial dependence plots
  • Model Monitoring: Performance degradation, data drift, retraining strategies

ML Operations Context

  • Model Deployment: Batch vs real-time scoring, API development
  • A/B Testing: Champion/challenger models, gradual rollout
  • Business Impact: Cost savings calculation, ROI measurement
  • Ethical Considerations: Bias detection, fairness metrics, model governance

4. Cloud & Infrastructure (AWS)

Core AWS Services for Analytics

  • S3: Data lake storage, bucket policies, lifecycle rules, versioning
  • Redshift: Data warehousing, spectrum for querying S3, distribution keys
  • Athena: Serverless SQL queries on S3 data
  • Glue: ETL jobs, data catalog, crawlers, schema discovery
  • EMR: Big data processing with Spark, Hadoop
  • SageMaker: ML model training, deployment, notebooks
  • Lambda: Serverless computing, event-driven processing
  • QuickSight: BI dashboards, visualizations
  • IAM: Access control, roles, policies

Data Pipeline Concepts

  • ETL vs ELT: Extract-Transform-Load patterns for different use cases
  • Data Quality: Validation rules, error handling, data lineage
  • Orchestration: Airflow, Step Functions, scheduling dependencies
  • Incremental Loading: Change data capture, timestamps, watermarks
  • Data Formats: Parquet, ORC, Avro, JSON optimization

5. Project Management & Agile

Agile Methodology

  • Scrum Framework: Sprints, daily standups, sprint planning, retrospectives, sprint reviews
  • Kanban: WIP limits, continuous flow, visual board management
  • User Stories: Writing acceptance criteria, story points, estimation
  • Backlog Management: Prioritization, grooming, epic breakdown
  • Agile Roles: Product Owner, Scrum Master, Development Team interactions
  • Metrics: Velocity, burndown charts, cycle time, throughput

Analytics Project Lifecycle

  • Problem Definition: Business objectives, success metrics, constraints
  • Data Collection: Source identification, access requirements, data assessment
  • Analysis & Modeling: Approach selection, iteration, validation
  • Insights & Recommendations: Actionable findings, impact quantification
  • Implementation: Deployment, monitoring, handoff documentation
  • Post-Deployment: Performance tracking, refinement, lessons learned

Independent Problem Solving

  • Navigating organizational complexity and finding stakeholders
  • Resourcefulness in finding data and documentation
  • Breaking down ambiguous requirements into actionable tasks
  • Risk identification and mitigation strategies
  • Escalation protocols and when to ask for help

6. Business Analytics

Building Business Cases

  • Cost-Benefit Analysis: ROI calculation, NPV, payback period
  • Impact Sizing: Market size estimation, addressable opportunity
  • Assumptions & Risks: Sensitivity analysis, scenario planning
  • Implementation Roadmap: Phased approach, resource requirements

KPIs & Metrics Development

  • Operations Metrics:
    • Average Handle Time (AHT), First Call Resolution (FCR)
    • Service Level, Abandonment Rate, Agent Utilization
    • Quality scores, Compliance rates
  • Banking Metrics Formulas:
    • AHT = (Total Talk Time + Total Hold Time + Total Wrap-up Time) / Total Calls
    • FCR = (Calls Resolved on First Contact / Total Calls) × 100
    • Service Level = (Calls Answered within Threshold / Total Calls Offered) × 100
    • NPS = % Promoters (9-10) - % Detractors (0-6)
  • Customer Experience:
    • Net Promoter Score (NPS), Customer Satisfaction (CSAT)
    • Customer Effort Score (CES), Churn rate
    • Journey completion rates, Drop-off analysis
  • Product Analytics:
    • Adoption rate, Feature usage, Retention cohorts
    • Conversion funnels, Time-to-value
  • ML Model Performance:
    • Prediction accuracy, Model lift, Business impact
    • Cost savings, Efficiency gains

Data Storytelling

  • Narrative Structure: Context → Complication → Resolution
  • Audience Adaptation: Technical vs executive presentations
  • Visualization Principles:
    • Chart selection (bar, line, scatter, heatmap)
    • Color usage, labeling, annotations
    • Reducing chart junk, emphasizing key insights
  • Insights vs Observations: "So what?" test, actionability
  • Executive Summaries: BLUF (Bottom Line Up Front), pyramid principle

Exploratory Analysis for Product Opportunities

  • Identifying pain points in user journeys
  • Benchmarking against industry standards
  • Cohort analysis to understand behavior segments
  • Root cause analysis techniques (5 Whys, Fishbone diagrams)
  • Prioritization frameworks (RICE, ICE, Value vs Effort)

7. Communication & Stakeholder Management

Working with Senior Leaders

  • Executive Presence: Confidence, clarity, conciseness
  • Meeting Preparation: Pre-reads, agenda setting, objectives
  • Presenting Recommendations: Options analysis, pros/cons, recommended path
  • Handling Questions: Bridging techniques, parking lot for tangents
  • Follow-up: Action items, timelines, accountability

Cross-Functional Collaboration

  • Engineering Teams: Technical feasibility, data requirements, API specs
  • Product Teams: Feature prioritization, user stories, success metrics
  • Design Teams: User research insights, prototype testing data
  • Business Units: Requirements gathering, change management, training
  • Influence without Authority: Building trust, finding win-wins, persistence

Presentation Skills

  • Structuring slides for different audiences (technical vs business)
  • Using appendix for detailed methodology and assumptions
  • Rehearsing timing and transitions
  • Managing Q&A sessions
  • Creating compelling dashboards that tell a story

8. Banking & Financial Services Domain

Consumer & Community Banking (CCB)

  • Product Lines:
    • Personal banking (checking, savings accounts)
    • Credit cards, mortgages, auto financing
    • Investment advice, wealth management
    • Small business banking
  • Customer Lifecycle: Acquisition, onboarding, engagement, retention, churn
  • Servicing Operations:
    • Call centers, branch operations, digital servicing
    • Dispute resolution, fraud prevention
    • Account maintenance, payment processing

Operational Excellence in Banking

  • Cost Reduction: Process automation, straight-through processing, self-service adoption
  • Efficiency Metrics: Cost per transaction, Cost per account, Operating expense ratio
  • Regulatory Compliance: KYC/AML, GDPR/privacy, audit trails
  • Risk Management: Fraud detection, credit risk, operational risk

ML/AI Use Cases in Banking Operations

  • Customer Service: Chatbots, intent classification, sentiment analysis, call routing
  • Fraud Detection: Anomaly detection, transaction monitoring, pattern recognition
  • Process Automation: Document processing (OCR), data extraction, RPA augmentation
  • Predictive Analytics: Churn prediction, product recommendations, loan default prediction
  • Operational Forecasting: Call volume prediction, staffing optimization

Understanding the Quad

  • Four Key Functions:
    • Product: Feature roadmap, user experience
    • Engineering: Technical implementation, infrastructure
    • Design: User research, interface design
    • Data/Analytics: Insights, measurement, optimization
  • How analytics informs product roadmap decisions
  • Establishing communication networks across teams

9. Dashboard Design Best Practices

Design Principles

  • 1. Know Your Audience:
    • Executive: High-level KPIs, trends, anomalies
    • Manager: Performance tracking, comparisons, drill-downs
    • Analyst: Granular data, filters, ad-hoc exploration
  • 2. Information Hierarchy:
    • Most important metrics top-left (Western reading pattern)
    • Group related metrics together
    • Progressive disclosure: summary → details
  • 3. Visual Best Practices:
    • Limit colors (3-4 max for categories)
    • Use white space effectively
    • Consistent formatting (dates, numbers, percentages)
    • Clear labels, no jargon
    • Include context (targets, benchmarks, previous period)
  • 4. Interactivity:
    • Filters should be prominent but not dominate
    • Drill-down paths should be intuitive
    • Tooltips for additional context
    • Reset button to return to default view

Chart Selection Guide

  • Comparison: Bar chart (categorical), Column chart (time-based)
  • Trend over time: Line chart
  • Part-to-whole: Pie/donut (if <6 categories), Stacked bar, Treemap
  • Distribution: Histogram, Box plot, Violin plot
  • Correlation: Scatter plot, Bubble chart
  • Magnitude: Bullet chart (vs target), KPI cards
  • Ranking: Horizontal bar chart (sorted)
  • Geographical: Filled map, Symbol map
  • Multi-dimensional: Heatmap, Small multiples

KPI Dashboard Structure (Operations)

┌─────────────────────────────────────────────────────────────┐ │ OPERATIONS PERFORMANCE DASHBOARD Month: Nov │ ├─────────────────────────────────────────────────────────────┤ │ [Filters: Date Range | Channel | Region | Product] │ ├─────────────────────────────────────────────────────────────┤ │ │ │ KEY METRICS (Big Numbers with Trend Indicators) │ │ ┌──────────────┬──────────────┬──────────────┬──────────┐ │ │ │ Call Volume │ Avg Handle │ First Call │ CSAT │ │ │ │ 125,430 │ Time │ Resolution │ 4.2/5.0 │ │ │ │ ▲ 5.2% │ 6:32 min │ 78.5% │ ▼ 0.3 │ │ │ │ │ ▼ 12s │ ▲ 2.1% │ │ │ │ └──────────────┴──────────────┴──────────────┴──────────┘ │ │ │ │ TRENDS (Line Charts) │ │ ┌────────────────────────────┬───────────────────────────┐ │ │ │ Call Volume - Last 30 Days │ FCR Rate by Channel │ │ │ │ [Line chart showing daily │ [Multi-line: Phone, Chat, │ │ │ │ volume with 7-day MA] │ Email trends] │ │ │ └────────────────────────────┴───────────────────────────┘ │ │ │ │ PERFORMANCE BREAKDOWN │ │ ┌────────────────────────────┬───────────────────────────┐ │ │ │ Top Call Drivers │ Agent Performance │ │ │ │ [Horizontal bar chart: │ [Scatter: AHT vs CSAT │ │ │ │ 1. Password reset - 25% │ with bubble size = vol] │ │ │ │ 2. Balance inquiry - 18% │ │ │ │ │ 3. Dispute - 15%...] │ │ │ │ └────────────────────────────┴───────────────────────────┘ │ │ │ │ ALERTS & ACTIONS │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ ⚠ CSAT below target in West region (-0.5) │ │ │ │ ⚠ Call volume spike on Nov 15 (+35%) - investigate │ │ │ │ ✓ AHT improvement target met for 3rd consecutive month│ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ Last updated: 2024-11-18 08:00 ET [Export] [Schedule] │ └─────────────────────────────────────────────────────────────┘

Common Mistakes to Avoid

  • ❌ 3D charts, pie charts with many slices, dual axes with different scales
  • ❌ Too many metrics (cognitive overload)
  • ❌ Misleading axes (not starting at zero for bar charts)
  • ❌ Red/green without alternative encoding (colorblind accessibility)
  • ❌ Meaningless default sort (alphabetical when magnitude matters)
  • ❌ No context (numbers without comparison)
  • ✅ Simple, focused, actionable insights
  • ✅ Clear data sources and refresh time
  • ✅ Mobile-responsive design

10. Sources & References

Statistical Foundations

  • Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury Press.
  • Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Wiley.
  • Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
  • Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

Machine Learning & AI

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://web.stanford.edu/~hastie/ElemStatLearn/
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer. https://www.statlearning.com/
  • Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32.
  • Chen, T., & Guestrin, C. (2016). "XGBoost: A Scalable Tree Boosting System." Proceedings of the 22nd ACM SIGKDD. arXiv:1603.02754

Data Visualization & Communication

  • Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Graphics Press.
  • Cairo, A. (2016). The Truthful Art: Data, Charts, and Maps for Communication. New Riders.
  • Knaflic, C. N. (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Wiley.
  • Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten (2nd ed.). Analytics Press.
  • Wilke, C. O. (2019). Fundamentals of Data Visualization. O'Reilly. https://clauswilke.com/dataviz/

Python & Data Science Tools

  • McKinney, W. (2017). Python for Data Analysis (2nd ed.). O'Reilly.
  • VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly. https://jakevdp.github.io/PythonDataScienceHandbook/
  • Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly.

Business Analytics & Product

  • Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of Winning. Harvard Business Press.
  • Provost, F., & Fawcett, T. (2013). Data Science for Business. O'Reilly.
  • Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly.
  • Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.

Key Research Papers

  • Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions." NIPS. arXiv:1705.07874
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?: Explaining the Predictions of Any Classifier." KDD. arXiv:1602.04938
  • McInnes, L., Healy, J., & Melville, J. (2018). "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction." arXiv:1802.03426
  • Ke, G., et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." NIPS.

Cloud & Engineering

  • Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly.
  • AWS Documentation. https://docs.aws.amazon.com/
  • Reis, J., & Housley, M. (2022). Fundamentals of Data Engineering. O'Reilly.
Note on References:
These resources provide comprehensive coverage of the concepts outlined in this guide. For interview preparation, focus on understanding core principles rather than memorizing formulas. Practical experience with Python, SQL, and data visualization tools is essential.