Quantitative Analytics
Core Concepts & Frameworks for Senior Associate Role
Role Overview
This position is within the CCB Data & Analytics organization, focusing on enabling data-driven decision-making through insights, recommendations, and strategic analysis.
Key Focus Areas:
• Reducing operating expenses through ML/AI solutions
• Improving customer and agent experience
• Data-informed product decisions
• Cross-functional collaboration (Engineering, Design, Product)
• Reducing operating expenses through ML/AI solutions
• Improving customer and agent experience
• Data-informed product decisions
• Cross-functional collaboration (Engineering, Design, Product)
Exploratory Data Analysis
ML/AI Operations
Product Analytics
Agile Methodology
Stakeholder Management
Data Storytelling
1. Data Analysis & Statistics
Exploratory Data Analysis (EDA)
- Univariate Analysis: Distributions, histograms, box plots, summary statistics
- Bivariate Analysis: Scatter plots, correlation matrices, cross-tabulation
- Multivariate Analysis: Multiple regression, MANOVA, factor analysis, canonical correlation
- Dimensionality Reduction: PCA, t-SNE, UMAP (for visualization and feature reduction)
- Data Quality Checks: Missing values, outliers, duplicates, data types
- Feature Engineering: Creating derived features, binning, encoding categorical variables
Statistical Concepts
- Descriptive Statistics: Mean, median, mode, standard deviation, percentiles
- Probability Distributions: Normal, binomial, Poisson, exponential
- Hypothesis Testing: t-tests, chi-square, ANOVA, p-values, significance levels
- Confidence Intervals: Understanding margin of error, sample size calculations
- Correlation vs Causation: Pearson, Spearman, confounding variables
- Statistical Power: Type I/II errors, power analysis
- A/B Testing: Experimental design, control groups, statistical significance
Key Statistical Formulas
- T-test: t = (x̄₁ - x̄₂) / √(s²(1/n₁ + 1/n₂))
- Chi-square: χ² = Σ(O - E)² / E
- Pearson correlation: r = Σ((xᵢ - x̄)(yᵢ - ȳ)) / √(Σ(xᵢ - x̄)² Σ(yᵢ - ȳ)²)
- Standard deviation: σ = √(Σ(xᵢ - μ)² / N)
- Confidence Interval: CI = x̄ ± (t* × SE)
Data Mining Techniques
- Pattern Recognition: Association rules (Apriori, FP-Growth)
- Anomaly Detection: Isolation Forest, statistical methods, clustering-based
- Time Series Analysis: Trends, seasonality, ARIMA, forecasting
- Segmentation: Customer segmentation, RFM analysis, cohort analysis
2. Programming & Tools
SQL (Essential)
Advanced SQL Topics
- Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), running totals
- CTEs & Subqueries: Common Table Expressions, nested queries, correlated subqueries
- Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS, self-joins
- Aggregations: GROUP BY, HAVING, ROLLUP, CUBE, grouping sets
- Performance Optimization: Indexing strategies, query execution plans, partitioning
- Data Manipulation: CASE statements, COALESCE, NULLIF, string functions
- Date/Time Functions: SQL Server (DATEADD, DATEDIFF), PostgreSQL (INTERVAL, AGE, EXTRACT), MySQL (DATE_ADD, DATE_SUB), Standard SQL (EXTRACT)
Python
Core Python Libraries
- Pandas: DataFrames, groupby, merge/join, pivot tables, time series, data cleaning
- NumPy: Arrays, vectorization, broadcasting, linear algebra operations
- Matplotlib/Seaborn: Data visualization, statistical plots, customization
- Scikit-learn: Preprocessing, model selection, evaluation metrics, pipelines
- SciPy: Statistical tests, optimization, interpolation
- Automation: File I/O, API calls, scheduling jobs, error handling
# Example: Data transformation automation
import pandas as pd
import numpy as np
# Load and clean data
df = pd.read_csv('operations_data.csv')
df['date'] = pd.to_datetime(df['date'])
# Feature engineering
df['hour'] = df['date'].dt.hour
df['day_of_week'] = df['date'].dt.dayofweek
# Aggregation with window functions
df['rolling_avg'] = df.groupby('agent_id')['handle_time'].transform(
lambda x: x.rolling(window=7).mean()
)
# Export results
df.to_csv('processed_data.csv', index=False)
Data Analytics Tools
Tableau
Dashboards, LOD calculations, parameters, actions, filters, data blending
Alteryx
ETL workflows, data preparation, predictive tools, spatial analytics
Snowflake
Cloud data warehouse, semi-structured data, data sharing, time travel
SAS
PROC SQL, data steps, statistical procedures, macros
R
dplyr, ggplot2, tidyr, statistical modeling, Shiny dashboards
Git
Version control, branching, pull requests, code collaboration
3. Machine Learning
Supervised Learning
- Regression: Linear, Ridge, Lasso, ElasticNet, Polynomial
- Classification: Logistic Regression, Decision Trees, Random Forest, Gradient Boosting (XGBoost, LightGBM)
- Support Vector Machines: Linear and non-linear kernels
- Neural Networks: Basic architectures, backpropagation
- Ensemble Methods: Bagging, boosting, stacking
Unsupervised Learning
- Clustering: K-means, DBSCAN, Hierarchical, Gaussian Mixture Models
- Dimensionality Reduction: PCA, t-SNE, UMAP, autoencoders
- Association Rules: Market basket analysis, recommendation systems
Model Development & Evaluation
- Feature Engineering: Selection methods (RFE, LASSO), importance ranking, interaction terms
- Train/Validation/Test Split: Cross-validation, stratification, time-based splits
- Metrics:
- Classification: Accuracy, Precision, Recall, F1, ROC-AUC, Confusion Matrix
- Regression: RMSE, MAE, R², MAPE
- Key ML Formulas:
- Precision: P = TP / (TP + FP)
- Recall: R = TP / (TP + FN)
- F1 Score: F1 = 2 × (P × R) / (P + R)
- RMSE: √(Σ(yᵢ - ŷᵢ)² / n)
- MAE: Σ|yᵢ - ŷᵢ| / n
- R²: 1 - (SS_res / SS_tot)
- Hyperparameter Tuning: Grid search, random search, Bayesian optimization
- Overfitting Prevention: Regularization, early stopping, dropout
- Model Interpretability: SHAP, LIME, feature importance, partial dependence plots
- Model Monitoring: Performance degradation, data drift, retraining strategies
ML Operations Context
- Model Deployment: Batch vs real-time scoring, API development
- A/B Testing: Champion/challenger models, gradual rollout
- Business Impact: Cost savings calculation, ROI measurement
- Ethical Considerations: Bias detection, fairness metrics, model governance
4. Cloud & Infrastructure (AWS)
Core AWS Services for Analytics
- S3: Data lake storage, bucket policies, lifecycle rules, versioning
- Redshift: Data warehousing, spectrum for querying S3, distribution keys
- Athena: Serverless SQL queries on S3 data
- Glue: ETL jobs, data catalog, crawlers, schema discovery
- EMR: Big data processing with Spark, Hadoop
- SageMaker: ML model training, deployment, notebooks
- Lambda: Serverless computing, event-driven processing
- QuickSight: BI dashboards, visualizations
- IAM: Access control, roles, policies
Data Pipeline Concepts
- ETL vs ELT: Extract-Transform-Load patterns for different use cases
- Data Quality: Validation rules, error handling, data lineage
- Orchestration: Airflow, Step Functions, scheduling dependencies
- Incremental Loading: Change data capture, timestamps, watermarks
- Data Formats: Parquet, ORC, Avro, JSON optimization
5. Project Management & Agile
Agile Methodology
- Scrum Framework: Sprints, daily standups, sprint planning, retrospectives, sprint reviews
- Kanban: WIP limits, continuous flow, visual board management
- User Stories: Writing acceptance criteria, story points, estimation
- Backlog Management: Prioritization, grooming, epic breakdown
- Agile Roles: Product Owner, Scrum Master, Development Team interactions
- Metrics: Velocity, burndown charts, cycle time, throughput
Analytics Project Lifecycle
- Problem Definition: Business objectives, success metrics, constraints
- Data Collection: Source identification, access requirements, data assessment
- Analysis & Modeling: Approach selection, iteration, validation
- Insights & Recommendations: Actionable findings, impact quantification
- Implementation: Deployment, monitoring, handoff documentation
- Post-Deployment: Performance tracking, refinement, lessons learned
Independent Problem Solving
- Navigating organizational complexity and finding stakeholders
- Resourcefulness in finding data and documentation
- Breaking down ambiguous requirements into actionable tasks
- Risk identification and mitigation strategies
- Escalation protocols and when to ask for help
6. Business Analytics
Building Business Cases
- Cost-Benefit Analysis: ROI calculation, NPV, payback period
- Impact Sizing: Market size estimation, addressable opportunity
- Assumptions & Risks: Sensitivity analysis, scenario planning
- Implementation Roadmap: Phased approach, resource requirements
KPIs & Metrics Development
- Operations Metrics:
- Average Handle Time (AHT), First Call Resolution (FCR)
- Service Level, Abandonment Rate, Agent Utilization
- Quality scores, Compliance rates
- Banking Metrics Formulas:
- AHT = (Total Talk Time + Total Hold Time + Total Wrap-up Time) / Total Calls
- FCR = (Calls Resolved on First Contact / Total Calls) × 100
- Service Level = (Calls Answered within Threshold / Total Calls Offered) × 100
- NPS = % Promoters (9-10) - % Detractors (0-6)
- Customer Experience:
- Net Promoter Score (NPS), Customer Satisfaction (CSAT)
- Customer Effort Score (CES), Churn rate
- Journey completion rates, Drop-off analysis
- Product Analytics:
- Adoption rate, Feature usage, Retention cohorts
- Conversion funnels, Time-to-value
- ML Model Performance:
- Prediction accuracy, Model lift, Business impact
- Cost savings, Efficiency gains
Data Storytelling
- Narrative Structure: Context → Complication → Resolution
- Audience Adaptation: Technical vs executive presentations
- Visualization Principles:
- Chart selection (bar, line, scatter, heatmap)
- Color usage, labeling, annotations
- Reducing chart junk, emphasizing key insights
- Insights vs Observations: "So what?" test, actionability
- Executive Summaries: BLUF (Bottom Line Up Front), pyramid principle
Exploratory Analysis for Product Opportunities
- Identifying pain points in user journeys
- Benchmarking against industry standards
- Cohort analysis to understand behavior segments
- Root cause analysis techniques (5 Whys, Fishbone diagrams)
- Prioritization frameworks (RICE, ICE, Value vs Effort)
7. Communication & Stakeholder Management
Working with Senior Leaders
- Executive Presence: Confidence, clarity, conciseness
- Meeting Preparation: Pre-reads, agenda setting, objectives
- Presenting Recommendations: Options analysis, pros/cons, recommended path
- Handling Questions: Bridging techniques, parking lot for tangents
- Follow-up: Action items, timelines, accountability
Cross-Functional Collaboration
- Engineering Teams: Technical feasibility, data requirements, API specs
- Product Teams: Feature prioritization, user stories, success metrics
- Design Teams: User research insights, prototype testing data
- Business Units: Requirements gathering, change management, training
- Influence without Authority: Building trust, finding win-wins, persistence
Presentation Skills
- Structuring slides for different audiences (technical vs business)
- Using appendix for detailed methodology and assumptions
- Rehearsing timing and transitions
- Managing Q&A sessions
- Creating compelling dashboards that tell a story
8. Banking & Financial Services Domain
Consumer & Community Banking (CCB)
- Product Lines:
- Personal banking (checking, savings accounts)
- Credit cards, mortgages, auto financing
- Investment advice, wealth management
- Small business banking
- Customer Lifecycle: Acquisition, onboarding, engagement, retention, churn
- Servicing Operations:
- Call centers, branch operations, digital servicing
- Dispute resolution, fraud prevention
- Account maintenance, payment processing
Operational Excellence in Banking
- Cost Reduction: Process automation, straight-through processing, self-service adoption
- Efficiency Metrics: Cost per transaction, Cost per account, Operating expense ratio
- Regulatory Compliance: KYC/AML, GDPR/privacy, audit trails
- Risk Management: Fraud detection, credit risk, operational risk
ML/AI Use Cases in Banking Operations
- Customer Service: Chatbots, intent classification, sentiment analysis, call routing
- Fraud Detection: Anomaly detection, transaction monitoring, pattern recognition
- Process Automation: Document processing (OCR), data extraction, RPA augmentation
- Predictive Analytics: Churn prediction, product recommendations, loan default prediction
- Operational Forecasting: Call volume prediction, staffing optimization
Understanding the Quad
- Four Key Functions:
- Product: Feature roadmap, user experience
- Engineering: Technical implementation, infrastructure
- Design: User research, interface design
- Data/Analytics: Insights, measurement, optimization
- How analytics informs product roadmap decisions
- Establishing communication networks across teams
9. Dashboard Design Best Practices
Design Principles
- 1. Know Your Audience:
- Executive: High-level KPIs, trends, anomalies
- Manager: Performance tracking, comparisons, drill-downs
- Analyst: Granular data, filters, ad-hoc exploration
- 2. Information Hierarchy:
- Most important metrics top-left (Western reading pattern)
- Group related metrics together
- Progressive disclosure: summary → details
- 3. Visual Best Practices:
- Limit colors (3-4 max for categories)
- Use white space effectively
- Consistent formatting (dates, numbers, percentages)
- Clear labels, no jargon
- Include context (targets, benchmarks, previous period)
- 4. Interactivity:
- Filters should be prominent but not dominate
- Drill-down paths should be intuitive
- Tooltips for additional context
- Reset button to return to default view
Chart Selection Guide
- Comparison: Bar chart (categorical), Column chart (time-based)
- Trend over time: Line chart
- Part-to-whole: Pie/donut (if <6 categories), Stacked bar, Treemap
- Distribution: Histogram, Box plot, Violin plot
- Correlation: Scatter plot, Bubble chart
- Magnitude: Bullet chart (vs target), KPI cards
- Ranking: Horizontal bar chart (sorted)
- Geographical: Filled map, Symbol map
- Multi-dimensional: Heatmap, Small multiples
KPI Dashboard Structure (Operations)
┌─────────────────────────────────────────────────────────────┐
│ OPERATIONS PERFORMANCE DASHBOARD Month: Nov │
├─────────────────────────────────────────────────────────────┤
│ [Filters: Date Range | Channel | Region | Product] │
├─────────────────────────────────────────────────────────────┤
│ │
│ KEY METRICS (Big Numbers with Trend Indicators) │
│ ┌──────────────┬──────────────┬──────────────┬──────────┐ │
│ │ Call Volume │ Avg Handle │ First Call │ CSAT │ │
│ │ 125,430 │ Time │ Resolution │ 4.2/5.0 │ │
│ │ ▲ 5.2% │ 6:32 min │ 78.5% │ ▼ 0.3 │ │
│ │ │ ▼ 12s │ ▲ 2.1% │ │ │
│ └──────────────┴──────────────┴──────────────┴──────────┘ │
│ │
│ TRENDS (Line Charts) │
│ ┌────────────────────────────┬───────────────────────────┐ │
│ │ Call Volume - Last 30 Days │ FCR Rate by Channel │ │
│ │ [Line chart showing daily │ [Multi-line: Phone, Chat, │ │
│ │ volume with 7-day MA] │ Email trends] │ │
│ └────────────────────────────┴───────────────────────────┘ │
│ │
│ PERFORMANCE BREAKDOWN │
│ ┌────────────────────────────┬───────────────────────────┐ │
│ │ Top Call Drivers │ Agent Performance │ │
│ │ [Horizontal bar chart: │ [Scatter: AHT vs CSAT │ │
│ │ 1. Password reset - 25% │ with bubble size = vol] │ │
│ │ 2. Balance inquiry - 18% │ │ │
│ │ 3. Dispute - 15%...] │ │ │
│ └────────────────────────────┴───────────────────────────┘ │
│ │
│ ALERTS & ACTIONS │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ⚠ CSAT below target in West region (-0.5) │ │
│ │ ⚠ Call volume spike on Nov 15 (+35%) - investigate │ │
│ │ ✓ AHT improvement target met for 3rd consecutive month│ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Last updated: 2024-11-18 08:00 ET [Export] [Schedule] │
└─────────────────────────────────────────────────────────────┘
Common Mistakes to Avoid
- ❌ 3D charts, pie charts with many slices, dual axes with different scales
- ❌ Too many metrics (cognitive overload)
- ❌ Misleading axes (not starting at zero for bar charts)
- ❌ Red/green without alternative encoding (colorblind accessibility)
- ❌ Meaningless default sort (alphabetical when magnitude matters)
- ❌ No context (numbers without comparison)
- ✅ Simple, focused, actionable insights
- ✅ Clear data sources and refresh time
- ✅ Mobile-responsive design
10. Sources & References
Statistical Foundations
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury Press.
- Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Wiley.
- Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
- Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
Machine Learning & AI
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://web.stanford.edu/~hastie/ElemStatLearn/
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer. https://www.statlearning.com/
- Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32.
- Chen, T., & Guestrin, C. (2016). "XGBoost: A Scalable Tree Boosting System." Proceedings of the 22nd ACM SIGKDD. arXiv:1603.02754
Data Visualization & Communication
- Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Graphics Press.
- Cairo, A. (2016). The Truthful Art: Data, Charts, and Maps for Communication. New Riders.
- Knaflic, C. N. (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Wiley.
- Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten (2nd ed.). Analytics Press.
- Wilke, C. O. (2019). Fundamentals of Data Visualization. O'Reilly. https://clauswilke.com/dataviz/
Python & Data Science Tools
- McKinney, W. (2017). Python for Data Analysis (2nd ed.). O'Reilly.
- VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly. https://jakevdp.github.io/PythonDataScienceHandbook/
- Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly.
Business Analytics & Product
- Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of Winning. Harvard Business Press.
- Provost, F., & Fawcett, T. (2013). Data Science for Business. O'Reilly.
- Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly.
- Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
Key Research Papers
- Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions." NIPS. arXiv:1705.07874
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?: Explaining the Predictions of Any Classifier." KDD. arXiv:1602.04938
- McInnes, L., Healy, J., & Melville, J. (2018). "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction." arXiv:1802.03426
- Ke, G., et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." NIPS.
Cloud & Engineering
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly.
- AWS Documentation. https://docs.aws.amazon.com/
- Reis, J., & Housley, M. (2022). Fundamentals of Data Engineering. O'Reilly.
Note on References:
These resources provide comprehensive coverage of the concepts outlined in this guide. For interview preparation, focus on understanding core principles rather than memorizing formulas. Practical experience with Python, SQL, and data visualization tools is essential.
These resources provide comprehensive coverage of the concepts outlined in this guide. For interview preparation, focus on understanding core principles rather than memorizing formulas. Practical experience with Python, SQL, and data visualization tools is essential.