| Acoustic Super Models for Large Scale Video Event Detection | R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran | Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona | November 2011 | Speech | [PDF]
|
| Multimodal Location Estimation on Flickr Videos | G. Friedland, J. Choi, H. Lei, and A. Janin | Proceedings of the ACM International Workshop on Social Media (WSM11), Scottsdale, Arizona | November 2011 | Speech | [PDF]
|
| Fast Speaker Diarization Using a High-Level Scripting Language | E. Gonina, G. Friedland, H. Cook, and K. Keutzer | Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011), Big Island, Hawaii | December 2011 | Speech | [PDF]
|
| On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval | R. Mertens, P.-S. Huang, L. Gottlieb, G. Friedland, and A. Divakaran | Proceedings of the IEEE International Symposium on Multimedia, Dana Point, California, pp. 446-451 | December 2011 | Speech | [PDF]
|
| Don't Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech Recognition | D. Gillick, L. Gillick, and S. Wegmann | Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), Big Island, Hawaii | December 2011 | Speech | [PDF]
|
| Finding Difficult Speakers in Automatic Speaker Recognition | L. Stoll | UC Berkeley PhD thesis, Berkeley, California | December 2011 | Speech | [PDF]
|
| Narrative Theme Navigation for Sitcoms Supported by Fan-Generated Scripts | G. Friedland, A. Janin, and L. Gottlieb | To appear in Multimedia Tools and Applications, Springer | 2012 | Speech | [PDF]
|
| Syllable Models for Mandarin Speech Recognition: Exploiting Character Language Models | X. Liu, J. L. Hieronymus, M. J. F. Gales, and P. C. Woodland | In submission | 2012 | Speech | |
| Features Based on Auditory Physiology and Perception | R. M. Stern and N. Morgan | In Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Wiley Publishing | 2012 | Speech | |
| Introduction to the Special Section on Deep Learning for Speech and Language Processing | D. Yu, G. Hinton, N. Morgan, J.-T. Chien, and S. Sagayama | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 4-6 | January 2012 | Speech | [PDF]
|
| Deep and Wide: Multiple Layers in Automatic Speech Recognition | N. Morgan | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 7-13 | January 2012 | Speech | [PDF]
|
| Special Section on New Frontiers in Rich Transcription | G. Friedland, J. Fiscus, T. Hain, and S. Furui (eds) | IEEE Transactions in Audio, Speech, and Language Processing, Vol. 20, No. 2 | February 2012 | Speech | |
| Speaker Diarization: A Review of Recent Research | X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 356-370 | February 2012 | Speech | [PDF]
|
| The ICSI RT-09 Speaker Diarization System | G. Friedland, A. Janin, D. Imseng, X. Anguera, L. Gottlieb, M. Huijbregts, M. Knox, and O. Vinyals | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 371-381 | February 2012 | Speech | [PDF]
|
| Multimodal City-Verification on Flickr Videos Using Acoustic and Textual Features | H. Lei, J. Choi, and G. Friedland | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| Spectro-Temporal Gabor Features for Speaker Recognition | H. Lei, B. T. Meyer, and N. Mirghafori | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| Discriminative Training for Speech Recognition is Compensating for Statistical Dependence on the HMM Framework | D. Gillick and S. Wegmann, L. Gillick | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| How to Put It Into Words - Using Random Forests to Extract Symbol Level Descriptions from Audio Content for Concept Detection | P.-S. Huang, R. Mertens, A. Divakaran, G. Friedland, and M. Hasegawa-Johns | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| Easy Does It: Robust Spectro-Temporal Many-Stream ASR Without Fine Tuning Streams | S. Ravuri and N. Morgan | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | |
| Articulatory Features for Expressive Speech Synthesis | A. Black, H. T. Bunnell, Y. Dou, P. Kumar, F. Metze, D. Perry, T. Polzehl, K. Prahallad, S. Steidl, and C. Vaug | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| From AUDREY to Siri: Is Speech Recognition A Solved Problem? | R. Pieraccini | Presented at the Mobile Voice Conference, San Francisco, California | March 2012 | Speech | [PDF]
|
| Cybercasing the Joint: Language Technologies, Multimedia Retrieval, and Online Privacy | G. Friedland | Presented at the Language Technologies Institute Colloquium, Carnegie Mellon University, Pittsburgh, Pennsylvania | April 13 2012 | Speech | [PDF]
|
| Speaker Diarization | G. Friedland and F. Valente | In Multimodal Signal Processing: Human Interactions in Meetings, S. Reynals, H. Bourlard, J. Carletta, and A. Popescu-Belis, eds., Cambridge University Press | June 2012 | Speech | |
| Semi-Autonomous Car Control Using Brain Computer Interfaces | D. Goehring, D. Latotzky, M. Wang, and R. Rojas | Proceedings of the 12th International Conference of Intelligent Autonomous Systems (IAS), Juju Island, Korea | June 2012 | Speech | |
| Multimodal Location Estimation of Consumer Media – Dealing with Sparse Training Data | J. Choi, G. Friedland, V. Ekambaram, and K. Ramchandran | Proceedings of the IEEE International Conference on Multimedia and Expo, Melbourne, Australia, pp. 43-48 | July 2012 | Speech | [PDF]
|
| Where did I go Wrong?: Identifying Troublesome Segments for Speaker Diarization Systems | M. T. Knox, N. Mirghafori, and G. Friedland | Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon | September 2012 | Speech | [PDF]
|
| Hooking Up Spectro-Temporal Filters with Auditory-Inspired Representations for Robust Automatic Speech Recognition | B. Meyer, C. Spille, B. Kollmeier, and N. Morgan | Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon | September 2012 | Speech | [PDF]
|
| Longer Features: They Do a Speech Detector Good | TJ Tsai and N. Morgan | Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon | September 2012 | Speech | |
| There is No Data Like Less Data: Percepts for Video Concept Detection on Consumer-Produced Media | Benjamin Elizalde; Gerald Friedland; Howard Lei; Ajay Divakaran | Proceedings of the ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis (AMVA) at ACM Multimedia 2012 (MM'12), Nara, Japan, pp. 27-32 | October 2012 | Speech | [PDF]
|
| Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-Location | L. Gottlieb, J. Choi, P. Kelm, T. Sikora, and G. Friedland | Proceedings of the ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), held in conjunction with ACM Multimedia 2012, pp. 23-28, Nara, Japan | October 2012 | Speech | [PDF]
|
| The 2012 ICSI/Berkeley Video Location Estimation System | J. Choi, V. Ekambaram, G. Friedland, and K. Ramchandran | Presented at the MediaEval 2012 Workshop, Pisa, Italy | October 2012 | Speech | [PDF]
|
| Hearing is Believing: Biologically-Inspired Feature Extraction for Robust Automatic Speech Recognition | R. M. Stern and N. Morgan | Signal Processing Magazine, Vol. 29, No. 6, pp. 34-43 | November 2012 | Speech | [PDF]
|