Muya Chang


ASIC & VLSI Research Scientist


Focus: VLSI, Computing System, Hardware-Algorithm Co-design, Machine Learning.

About Me

Hello, I am a ASIC & VLSI research scientist at NVIDIA, where we research and develop creative and innovative ASIC and VLSI design techniques, machine learning accelerator approaches, and/or novel digital VLSI circuits. I really like the things I am doing because it utilizes a lot of my background which I have been trying to accumulate over the past years.
Before that I had been in Georgia Tech from 2016 to 2022. I started pursueing Ph.D. program in Electrical and Computer Engineering from Fall 2016. Around half time through the program, I realized I was interested in (also lacking) many things in Computer Science domain, therefore I applied for M.S. program in Computer Science in order to strengthen my overall coding skill and fill in the knowledge I felt I was lacking. After taking 20+ courses in 9 semesters, I proudly completed both degrees in 2020. Besides that I also met my wife in course CS 7210 where we were in the same group and spent a lot of effort completing this lab from UW, which was one of the most challenging and interesting lab I have ever done. We started dating after the course ended, and got married a year after.
I was a member of the Integrated Circuits and Systems Research Lab and was advised by ECE Professor Arijit Raychowdhury. Arijit has been a wonderful advisor/mentor/role-model and I can't express how much I had learned from him and throughout countless days of making chips and debugging all kinds of weird problems. During the time great thanks to Arijit I had the honor to do an internship at Qualcomm, where I met another life-long mentor Dr. Keith Bowman and my main direction was to research and develop near-memory computing to reduce data movement while maintaining cache area efficiency.

Work Experience

NVIDIA
Research Scientist
NVIDIA @ Santa Clara CA, USA
ASIC & VLSI Group
May 2022 - Present
  • Research and develop creative and innovative ASIC and VLSI design techniques, machine learning accelerator approaches, and/or novel digital VLSI circuits.
  • Contribute to novel research advancing the state-of-the-art in machine learning accelerator design.
  • Collaborate on the development of research prototype testchips.
  • Develop and apply machine learning to ASIC and VLSI design tool flows.
  • Collaborate with circuits and architecture team members in research and product teams.
  • Publish and present original research, speak at conferences and events
  • Collaborate with external researchers and a diverse set of internal product teams.
University of Notre Dame
Research Associate
University of Notre Dame @ Notre Dame IN, USA
ECE Department, Supervisor: Ningyuan Cao
Dec 2021 - Present
  • Develop organized tape-out tutorial for new coming PhD students.
Georgia Institute of Technology
Postdoctoral Fellow
Georgia Institute of Technology @ Atlanta GA, USA
ECE Department, Advisor: Arijit Raychowdhury
Jan 2021 - May 2022
Qualcomm
Interim Intern
Qualcomm @ Raleigh NC, USA
Corporate Research and Development (CRD) Processor Research Team
May 2019 - Aug 2019
  • Researching and develop near-memory computing to reduce data movement while maintaining cache area efficiency.
  • Optimizing the SRAM sub-array size and number of MACs per sub-array based on target DNN workloads for future ML accelerators.
  • Exploring circuit and micro-architectural techniques to further optimize the performance, energy efficiency, and area.
Georgia Institute of Technology
Researcher
Georgia Institute of Technology @ Atlanta GA, USA
ECE Department, Advisor: Arijit Raychowdhury
Aug 2016 - Dec 2020
M2COMM
Marketing Engineer
M2COMM @ Hsinchu, Taiwan
Marketing Team
Sep 2015 - Aug 2017
  • Develop company website.
  • Manage product related technical documents.

Achievement

Awards

  • “Code-a-Chip” Travel Grant Awards, IEEE International Solid-State Circuits Conference
    IEEE ISSCC, Feb 2023
  • Best Paper Award, IEEE Opportunity Research Scholars Symposium
    IEEE ORSS, Apr 2022
  • Best Paper Award, IEEE Custom Integrated Circuits Conference (CICC)
    IEEE CICC, Apr 2021
  • Taiwan Government Scholarship to Study Abroad (GSSA)
    Taiwan Ministry of Education, May 2019
  • Qualcomm Innovation Fellowship Award
    Qualcomm, May 2019
  • Chih Foundation Graduate Student Research Publication Award
    Chih Foundation, May 2019
  • ECE Graduate TA Excellence Award
    Georgia Institute of Technology, Apr 2019
  • Exchange Program Scholarship to University of Illinois at Urbana-Champaign
    National Chiao Tung University, Jan 2014
  • Calculus Award (Top 20/1166)
    National Chiao Tung University, Jun 2011
  • Calculus Award (Top 20/1202)
    National Chiao Tung University, Jan 2011
  • Merit Scholarships in the department of Electronics & Engineering
    National Chiao Tung University, Sep 2010

Certifications

  • Live Training (CEU/PDH)
    IEEE, Nov 2019

In the News

  • WEB New Optimization Chip Tackles Machine Learning, 5G Routing
    IEEE Spectrum, May 2019

Grad. Research

My research interests include energy-efficient hardware design for distributed optimizations. During my PhD life, I had the chance to start from FPGA design, ASIC design, and automatic measurement development. Through the process, the more I learned, the more I have been fascinated by the impact technology has brought into our lives.

Heterogeneous RRAM In-Memory and SRAM Near-Memory SoC for Hybrid Frame and Event-Based Target Tracking

Related publications

SoC Design on Resistive-RAM (RRAM) with Integrated ARM Cortex M3

Related publications
  • WEB PDF Presentation video for ISSCC 2022 Demo video for ISSCC 2022 MNIST MLP Demo
    A 40nm 60.64TOPS/W ECC-Capable Compute-in-Memory/Digital 2.25MB/768KB RRAM/SRAM System with Embedded Cortex M3 Microprocessor for Edge Recommendation Systems
    M. Chang, S. Spetalnick, B. Crafton, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2022 IEEE ISSCC
  • WEB PDF
    A 40nm 64kb 26.56 TOPS/W 2.37Mb/mm2 RRAM Binary/Compute-in-Memory Macro with 4.23x Improvement in Density and > 75% use of Sensing Dynamic Range
    S. Spetalnick, M. Chang, B. Crafton, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2022 IEEE ISSCC
  • WEB PDF
    Experimental Fault Rate Characterization and Protection in Embedded RRAM
    C. Talley, B. Crafton, S. Spetalnick, M. Chang, A. Raychowdhury
    2022 IEEE ORSS

AC-SAT Solver on ASIC

Related publications
  • WEB PDF
    An Analog Clock-free Compute Fabric base on Continuous-Time Dynamical System for Solving Combinatorial Optimization Problems
    M. Chang, X. Yin, Z. Toroczkai, X. Hu, A. Raychowdhury
    2022 IEEE CICC

Optimo: ASIC for Distributed Optimization

Related publications
  • WEB PDF
    Optimo: A 65nm 270mhz 143.2 mw programmable spatial-array-processor with a hierarchical multi-cast on-chip network for solving distributed optimizations
    M. Chang, L.-H. Lin, J. Romberg, and A. Raychowdhury
    2019 IEEE CICC
  • WEB PDF
    Optimo: A 65nm 279gops/w 16b programmable spatial-array-processor with on-chip network for solving distributed optimizations via the alternating direction method of multipliers
    M. Chang, L.-H. Lin, J. Romberg, and A. Raychowdhury
    2019 IEEE JSSC

Hardware for Distributed Optimization on FGPA

Related publications
  • WEB PDF
    Efficient signal reconstruction via distributed least square optimization on a systolic fpga architecture
    M. Chang, S. Gangopadhyay, T. Hamam, J. Romberg, and A. Raychowdhury
    2019 IEEE ICASSP

Service

  • 2024 IEEE JSSC Paper Reviewer
  • 2024 IEEE VLSI Paper Reviewer
  • 2023 IEEE AICAS Paper Reviewer
  • 2023 IEEE JSSC Paper Reviewer
  • 2023 IEEE ISSCC Paper Reviewer
  • 2021 IEEE TVLSI Paper Reviewer
  • 2021 IEEE AICAS Paper Reviewer
  • 2020 IEEE AICAS Paper Reviewer

Education

Georgia Institute of Technology
Georgia Institute of Technology
Ph.D, Electrical and Computer Engineering, GPA 4.0/4.0
Specialization: Very Large Scale Integration (VLSI)
Aug 2016 - May 2020
  • ECE 6122 - Advanced Programming Techniques
  • ECE 6130 - Advanced VLSI Systems
  • ECE 6133 - Physical Design Automation
  • ECE 6420 - Wireless IC Design
  • ECE 6500 - Fourier Techniques and Signal Analysis
  • ECE 8813 - Advanced Digital Design using Verilog
  • ECE 8823 - GPU Architecture
  • ECE 8843 - Mathematical Foundations of Machine Learning
  • ECE 8853 - Introduction to Quantum Computing Systems
  • ECE 8893 - Digital Systems at Nanometer Nodes
Georgia Institute of Technology
Georgia Institute of Technology
M.S., Computer Science, GPA 3.9/4.0
Specialization: Computing System
Aug 2019 - May 2020
  • CS 7210 - Distributed Computing
  • CS 7260 - Internetworking Architectures and Protocols
  • CSE 6230 - High Performance Computing
  • CSE 6242 - Data and Visual Analytics
  • CS 6220 - Big Data System & Analytics
  • CS 6210 - Advanced Operation System
  • CS 6505 - Computability & Algorithms
  • CS 6290 - High Performance Computer Architecture
University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
Exchange Program, Electrical & Computor Engineering
Jan 2014 - May 2014
  • ECE 350 - Fields and Waves II
  • ECE 417 - Multimedia Signal Processing
  • ECE 483 - Analog IC Design
  • CS 484 - Parallel Programming
National Chiao Tung University
National Chiao Tung University
B.S., Electronics Engineering, GPA 3.9/4.0
Aug 2010 - Jun 2014

Reference

  • Dr. Arijit Raychowdhury
    Professor in School of Electrical and Computer Engineering
    Georgia Institute of Technology @ Atlanta, GA, USA
    Relation: PhD Advisor (Aug 2016 ~ Present)
  • Dr. Keith Bowman
    Principal Engineer and Manager
    Qualcomm @ Raleigh, NC, USA
    Relation: Manager (May 2019 ~ Aug 2019)
  • Dr. Justin Romberg
    Schlumberger Professor
    Georgia Institute of Technology @ Atlanta, GA, USA
    Relation: Research co-advisor (Sep 2016 ~ Dec 2020)
  • Dr. Tushar Krishna
    Associate Professor
    Georgia Institute of Technology @ Atlanta, GA, USA
    Relation: Research committee (Nov 2019 - Dec 2020)

Publications

Journals

  • In-Situ Privacy via Mixed-Signal Perturbation and Hardware-Secure Data Reversibility
    S. Davis, J. Liu, B. Chang, M. Chang, N. Cao
    2024 IEEE TCAS I
  • E-Gaze: Gaze Estimation with Event Camera
    N. Li, M. Chang, A. Raychowdhury
    2024 IEEE TPAMI
  • A 40nm Compute-in-Memory Macro With RRAM Addressing IR Drop and Off-State Current
    S. Spetalnick, M. Chang, S. Konno, B. Crafton, A. Lele, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2023 IEEE SSC-L
  • A heterogeneous rram in-memory and sram near-memory soc for fused frame and event-based target identification and tracking
    A. Lele, M. Chang, S. Spetalnick, B. Crafton, S. Konno, Z. Wan, A. Bhat, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2023 IEEE JSSC
  • A 40-nm 118.44-TOPS/W Voltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi-Bit Encoding
    J. Yoon, M. Chang, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2022 IEEE JSSC
  • A 40-nm, 64-Kb, 56.67 TOPS/W Voltage-Sensing Computing-In-Memory/Digital RRAM Macro Supporting Iterative Write With Verification and Online Read-Disturb Detection
    J. Yoon, M. Chang, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2021 IEEE JSSC
  • EM and Power SCA-resilient AES-256 through>350×Current Domain Signature Attenuation & Local Lower Metal Routing
    D. Das, J. Daniel, A. Golder, N. Modak, S. Maity, B. Chatterjee, D. Seo, M. Chang, A. Varna, H. Krishnamurthy, S. Mathew, S. Ghosh, A. Raychowdhury, S. Sen
    2020 IEEE JSSC
  • Optimo: A 65nm 279gops/w 16b programmable spatial-array-processor with on-chip network for solving distributed optimizations via the alternating direction method of multipliers
    M. Chang, L.-H. Lin, J. Romberg, and A. Raychowdhury
    2019 IEEE JSSC
  • A 65-nm 8-to-3-b 1.0-0.36-v 9.1-1.1-tops/w hybriddigital-mixed-signal computing platform for accelerating swarm robotics
    N. Cao, M. Chang, and A. Raychowdhury
    2019 IEEE JSSC
  • A ferrofet based in-memory processor for solving distributed and iterative optimizations via least-squares method
    I. Yoon, M. Chang, K. Ni, M. Jerry, S. Gangopadhyay, G. Smith, T. Hamam, J. Romberg, V. Narayanan, A. Khan, S. Datta, A. Raychowdhury
    2019 IEEE JXCDC

Conferences

  • 30.1 A 40nm VLIW Edge Accelerator with 5MB of 0.256 pJ/b RRAM and a Localization Solver for Bristle Robot Surveillance
    S. Spetalnick, A. Lele, B. Crafton, M. Chang, S. Ryu, J. Yoon, Z. Hao, A. Ansari, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2024 IEEE ISSCC
  • Neuromorphic swarm on rram compute-in-memory processor for solving qubo problem
    A. Lele, M. Chang, S. Spetalnick, B. Crafton, A. Raychowdhury, Y. Fang
    2023 IEEE DAC
  • A 2.38 MCells/mm2 9.81 -350 TOPS/W RRAM Compute-in-Memory Macro in 40nm CMOS with Hybrid Offset/IOFF Cancellation and ICELL RBLSL Drop Mitigation
    S. Spetalnick, M. Chang, S. Konno, B. Crafton, A. Lele, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2023 IEEE VLSI
  • Live Demonstration: Hybrid RRAM and SRAM SoC for Fused Frame and Event Target Tracking
    A. Lele, M. Chang, S. Spetalnick, Y. Fang, B. Crafton, S. Konno, A. Raychowdhury
    2023 IEEE ISCAS
  • A 65 nm 1.4-6.7 tops/w adaptive-snr sparsity-aware cim core with load balancing support for dl workloads
    M. Ali, I. Charkraborty, S. Choudhary, M. Chang, D. Kim, A. Raychowdhury, K. Roy
    2023 IEEE CICC
  • Privacy-by-Sensing with Time-domain Differentially-Private Compressed Sensing
    J. Liu, B. Cheng, P. Zeng, S. Davis, M. Chang, N. Cao
    2023 IEEE DATE
  • A GaN-Based Reconfigurable Series-Parallel Hybrid Converter Supporting 48/24/12V Input and 0.8-1.2 V Output with 83.7/87.8/90.7% Peak Efficiency
    M. Gong, H. Chen, M. Chang, J. Yoon, X. Zhang, R. Jain, A. Raychowdhury
    2023 IEEE APEC
  • A 73.53 TOPS/W 14.74 TOPS heterogeneous RRAM in-memory and SRAM near-memory SoC for hybrid frame and event-based target tracking
    M. Chang, A. Lele, S. Spetalnick, B. Crafton, S. Konno, Z. Wan, A. Bhat, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2023 IEEE ISSCC
  • Stochastic Mixed-Signal Circuit Design for In-Sensor Privacy
    J. Liu, B. Cheng, M. Chang, N. Cao
    2022 IEEE ICCAD
  • Experimental Fault Rate Characterization and Protection in Embedded RRAM
    C. Talley, B. Crafton, S. Spetalnick, M. Chang, A. Raychowdhury
    2022 IEEE ORSS
  • An Analog Clock-free Compute Fabric base on Continuous-Time Dynamical System for Solving Combinatorial Optimization Problems
    M. Chang, X. Yin, Z. Toroczkai, X. Hu, A. Raychowdhury
    2022 IEEE CICC
  • A 40nm 60.64TOPS/W ECC-Capable Compute-in-Memory/Digital 2.25MB/768KB RRAM/SRAM System with Embedded Cortex M3 Microprocessor for Edge Recommendation Systems
    M. Chang, S. Spetalnick, B. Crafton, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2022 IEEE ISSCC
  • A 40nm 64kb 26.56 TOPS/W 2.37Mb/mm2 RRAM Binary/Compute-in-Memory Macro with 4.23x Improvement in Density and > 75% use of Sensing Dynamic Range
    S. Spetalnick, M. Chang, B. Crafton, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2022 IEEE ISSCC
  • A 40nm 100Kb 118.44TOPS/W Ternary-weight Compute-in-Memory RRAM Macro with Voltage-sensing Read and Write Verification for reliable multi-bit RRAM operation
    J. Yoon, M. Chang, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2021 IEEE CICC
  • A 40nm 64Kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-Feedback-Based Read and In-Situ Write Verification
    J. Yoon, M. Chang, W. Khwa, Y. Chih, M. Chang, A. Raychowdhury
    2021 IEEE ISSCC
  • A 65nm Thermometer-Encoded Time-Based Compute-in-Memory Neural Network Accelerator at \\0.735pJ/MAC and 0.41pJ/Update
    M. Gong, N. Cao, M. Chang, A. Raychowdhury
    2020 IEEE TCAS-II
  • A 65nm Image Processing SoC Supporting Multiple DNN Models and Real-Time Computation-Communication Trade-Off Via Actor-Critical Neuro-Controller
    N. Cao, B. Chatterjee, M. Gong, M. Chang, S. Sen, A. Raychowdhury
    2020 IEEE VLSI
  • EM and Power SCA-Resilient AES-256 in 65nm CMOS Through >350x Current-Domain Signature Attenuation
    D. Das, J. Daniel, A. Golder, N. Modak, S. Maity, B. Chatterjee, D. Seo, M. Chang, A. Varna, H. Krishnamurthy, S. Mathew, S. Ghosh, A. Raychowdhury, S. Sen
    2020 IEEE ISSCC
  • Efficient signal reconstruction via distributed least square optimization on a systolic fpga architecture
    M. Chang, S. Gangopadhyay, T. Hamam, J. Romberg, and A. Raychowdhury
    2019 IEEE ICASSP
  • Optimo: A 65nm 270mhz 143.2 mw programmable spatial-array-processor with a hierarchical multi-cast on-chip network for solving distributed optimizations
    M. Chang, L.-H. Lin, J. Romberg, and A. Raychowdhury
    2019 IEEE CICC
  • 14.1 a 65nm 1.1-to-9.1 tops/w hybrid-digitalmixed-signal computing platform for accelerating model-based and model-free swarm robotics
    N. Cao, M. Chang, and A. Raychowdhury
    2019 IEEE ISSCC
  • A fefet based processing-in-memory architecture for solving distributed least-square optimizations
    I. Yoon, M. Chang, K. Ni, M. Jerry, S. Gangopadhyay, G. Smith, T. Hamam, V. Narayanan, J. Romberg, S. Lu, S. Datta, A. Raychowdhury
    2018 IEEE DRC