Big Data Developer Job at Capgemini, New York, NY

M3VTSERtVHBVMGh0eTFRSGN1ZGwvVXJS
  • Capgemini
  • New York, NY

Job Description

We’re looking for a seasoned Senior Data Engineer with strong Hadoop to design, build, and scale data pipelines and platforms powering analytics, AI/ML, and business operations. You’ll own end-to-end data engineering—from ingestion and transformation to performance optimization—across large-scale distributed systems and modern cloud data platforms.

Key Responsibilities

  • Design & Build Data Pipelines: Architect, develop, and maintain robust ETL/ELT pipelines for batch and streaming data using Hadoop ecosystem, Spark, and Airflow.
  • Big Data Architecture: Define and implement scalable big data architectures, ensuring reliability, fault tolerance, and cost efficiency.
  • Data Modeling: Develop and optimize data models for Data Warehouse and Operational Data Store (ODS) ; ensure conformed dimensions and star/snowflake schemas where appropriate.
  • SQL Expertise: Write, optimize, and review complex SQL/HiveQL queries for large datasets; enforce query standards and patterns.
  • Performance Tuning: Optimize Spark jobs, SQL queries, storage formats (e.g., Parquet/ORC), partitioning, and indexing to improve latency and throughput.
  • Data Quality & Governance: Implement data validation, lineage, cataloging, and security controls across environments.
  • Workflow Orchestration: Build and manage DAGs in Airflow , ensuring observability, retries, alerting, and SLAs.
  • Cross-functional Collaboration: Partner with Data Science, Analytics, and Product teams to deliver reliable datasets and features.
  • Best Practices: Champion coding standards, CI/CD, infrastructure-as-code (IaC), and documentation across the data platform.

Required Qualifications

  • 7+ years of hands-on data engineering experience building production-grade pipelines.
  • Strong experience with Hadoop (HDFS, YARN), Hive SQL/HiveQL , Spark (Scala/Java/PySpark), and Airflow .
  • Expert-level SQL skills with the ability to write and tune complex queries on large datasets.
  • Solid understanding of Big Data architecture patterns (e.g., lakehouse, data lake + warehouse, CDC).
  • Deep knowledge of ETL/ELT and DW/ODS concepts (slowly changing dimensions, partitioning, columnar storage, incremental loads).
  • Proven track record in performance tuning for large-scale systems (Spark jobs, shuffle optimizations, broadcast joins, skew handling).
  • Strong programming background in Java and/or Scala (Python is a plus).

Preferred Skills

  • Experience with AI-driven data processing (feature engineering pipelines, ML-ready datasets, model data dependencies).
  • Hands-on with cloud data platforms ( AWS , GCP , or Azure )—services like EMR/Dataproc/HDInsight, S3/GCS/ADLS, Glue/Dataflow, BigQuery/Snowflake/Redshift/Synapse.
  • Exposure to NoSQL databases (Cassandra, HBase, DynamoDB, MongoDB).
  • Advanced data governance & security (row/column-level security, tokenization, encryption at rest/in transit, IAM/RBAC, data lineage/catalog).
  • Familiarity with Kafka (topics, partitions, consumer groups, schema registry, stream processing).
  • Experience with CI/CD for data (Git, Jenkins/GitHub Actions, Terraform), containerization (Docker, Kubernetes).
  • Knowledge of metadata management and data observability (Great Expectations, Monte Carlo, OpenLineage).

Life at Capgemini:

Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer:

Flexible work

Healthcare including dental, vision, mental health, and well-being programs

Financial well-being programs such as 401(k) and Employee Share Ownership Plan

Paid time off and paid holidays

Paid parental leave

Family building benefits like adoption assistance, surrogacy, and cryopreservation

Social well-being benefits like subsidized back-up child/elder care and tutoring

Mentoring, coaching and learning programs

Employee Resource Groups

Disaster Relief

Disclaimer:

Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law.

This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship.

Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact.

Click the following link for more information on your rights as an Applicant

Job Tags

Flexible hours,

Similar Jobs

Ascendion

Of Counsel Job at Ascendion

Job Title: Of Counsel Location: Los Angeles/Orange County, CA (Hybrid) About this position: As Of Counsel with a national law firm, you will represent clients in complex insurance defense/ personal injury litigation matters across multiple jurisdictions. Closely...

Zenity

Senior ABM Manager Job at Zenity

 ...Zenity is hiring an Senior ABX Manager with a primary focus on pipeline creation. You will build the foundation for strategic ABM and run programs that influence acquisition, accelerate deal progression, and support expansion over time. Your work will help Zenity move... 

Provider Solutions & Development

Urologist Job at Provider Solutions & Development

 ...Snohomish County, Washington. This role includes clinic visits, procedures, and surgical cases in addition to on-call Urology coverage for Swedish Edmonds Hospital. This is a great opportunity to work in a friendly, collaborative environment while also learning from your... 

Capgemini Engineering

Lead Product Software Engineer- Systems & Storage Job at Capgemini Engineering

 ...About the job youre considering Capgemini Engineering is looking for an experienced and innovative Lead Product Software Engineer to architect and develop next generation systems and storage technologies. In this role, you will lead complex technical initiatives,... 

John Galt Staffing

Cyber Security Manager Job at John Galt Staffing

 ...and managing the company's cybersecurity program. This includes developing and implementing security policies, procedures, and controls to protect our information assets from cyber threats. The ideal candidate will have a strong technical background in cybersecurity, as...