| Automated Information Extraction in Production | R. Desutter, J.P. Evain, G. Friedland, A. Messina, and M. Sano | Special issue in Multimedia Tools and Applications, Springer | 2011 | Speech | |
| Computationally Efficient Clustering of Audio-Visual Meeting Data | H. Hung, G. Friedland, and C. Yeo | In Multimedia Interaction and Intelligent User Interfaces: Principles, Methods, and Applications, M. Etho, J. Luo, and L. Shao, eds., pp. 25-59 | 2010 | Speech | |
| CUDA-Level Performance with Python-Level Productivity for Gaussian Mixture Model Applications | H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, and A. Fox | Proceedings of the Third USENIX Workshop on Hot Topics in Parallelism (HotPar ’11), Berkeley, California | May 2011 | Speech | [PDF]
|
| User Verification: Matching the Uploaders of Videos Across Accounts | H. Lei, J. Choi, A. Janin, and G. Friedland | Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 2404-2407 | May 2011 | Speech | [PDF]
|
| Automatic Tagging and Geo-Tagging in Video Collections and Communities | M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones | Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR 2011), Trento, Italy, April 2011 | April 2011 | Speech | [PDF]
|
| The SRI NIST 2010 Speaker Recognition Evaluation System | N. Scheffer, L. Ferrer, M. Graciarena, S. Kajarekar, E. Shriberg, and A. Stolcke | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 5292-5295 | May 2011 | Speech | [PDF]
|
| Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 2nd Edition | B. Gold, N. Morgan, and D. Ellis | Wiley | November 2011 | Speech | |
| Deep and Wide: Multiple Layers in Automatic Speech Recognition | N. Morgan | IEEE Transactions on Audio, Speech, and Language Processing, Special Issue on Deep Learning | 2011 | Speech | [PDF]
|
| The IBM 2009 GALE Arabic Speech Transcription System | B. Kingsbury, H. Soltau, G. Saon, S. Chu, H.-K. Kuo, L. Mangu, S. Ravuri, A. Janin, and N. Morgan | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 4672-4675 | May 2011 | Speech | [PDF]
|
| Language-Independent Constrained Cepstral Features for Speaker Recognition | E. Shriberg and A. Stolcke | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 5296-5299 | May 2011 | Speech | [PDF]
|
| Bird Species Recognition Combining Acoustic and Sequence Modeling | M. Graciarena, M. Delplanche, E. Shriberg, and A. Stolcke | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 341-344 | May 2011 | Speech | [PDF]
|
| Making the Most from Multiple Microphones in Meeting Recognition | A. Stolcke | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 4992-4995 | May 2011 | Speech | [PDF]
|
| Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms | A. Stolcke, M. Akbacak, L. Ferrer, S. Kajarekar, C. Richey, N. Scheffer, and E. Shriberg | Proceedings of the Odyssey Speaker and Language Recognition Workshop, Brno, Czech Republic, pp. 256-262 | June 2010 | Speech | [PDF]
|
| The Automatic Recognition of Emotions in Speech | A. Batliner, B. Schuller, D. Seppi, S. Steidl, L. Devillers, L. Vidrascu, T. Vogt, V. Aharonson, and N. Amir | Article in P. Petta, Paolo, C. Pelachaud, R. Cowie, eds., Emotion-Oriented Systems: The Humaine Handbook Cognitive Technologies, pp. 71-99, Springer | 2011 | Speech | |
| Exploiting User Feedback for Language Model Adaptation in Meeting Recognition | D. Vergyri, A. Stolcke, and G. Tur | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp. 4737-4740 | April 2009 | Speech | [PDF]
|
| Associating Children’s Non-Verbal and Verbal Behaviour: Body Movements, Emotions, and Laughter in a Human-Robot Interaction | A. Batliner, S. Steidl, and E. Nöth | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 22-27 | May 2011 | Speech | [PDF]
|
| Comparing Multilayer Perceptron to Deep Belief Network Tandem Features for Robust ASR | O. Vinyals and S. Ravuri | Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '11), Prague, Czech Republic | May 2011 | Speech | [PDF]
|
| On the Use of Spectro-Temporal Features in Noise-Additive Speech | S. Ravuri | UC Berkeley Master's thesis, Spring 2011 | 2011 | Speech | [PDF]
|
| Improved Overlapped Speech Handling for Speaker Diarization | K. Boakye, O. Vinyals, and G. Friedland | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 941-944 | August 2011 | Speech | |
| How Good Is the Crowd at "Real" WSD? | J. Hong and C. F. Baker | Proceedings of the Fifth Linguistic Annotation Workshop (LAW-V), Portland, Oregon | June 2011 | Speech | [PDF]
|
| Data Selection with Kurtosis and Nasality features for Speaker Recognition | H. Lei and N. Mirghafori | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 2753-2756 | August 2011 | Speech | [PDF]
|
| Improved Classification of Speaking Styles for Mental Health Monitoring using Phoneme Dynamics | K. Chang, H. Lei, and J. Canny | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 85-88 | August 2011 | Speech | [PDF]
|
| Effective Arabic Dialect Classification Using Diverse Phonotactic Models | M. Akbacak, D. Vergyri, A. Stolcke, N. Scheffer, and A. Mandal | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 737-740 | August 2011 | Speech | [PDF]
|
| Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training | M. H. Sanchez, L. Ferrer, E. Shriberg, and A. Stolcke | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 141-144 | August 2011 | Speech | [PDF]
|
| Speaker Diarization | G. Friedland | In Speech and Audio Signal Processing, 2nd edition, B. Gold, N. Morgan, D. Ellis, eds., Wiley | 2011 | Speech | |
| Video2GPS: A Demo of Multimodal Location Estimation on Flickr Videos | G. Friedland, J. Choi, and A. Janin | Proceedings of the ACM Multimedia Conference (MM'11), Scottsdale, Arizona | November 2011 | Speech | [PDF]
|
| Data-Driven vs. Semantic-Technology-Driven Tag-Based Video Location Estimation | J. Choi and G. Friedland | Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, California, pp. 243-246 | September 2011 | Speech | [PDF]
|
| Estimating Dominance in Multi-Party Meetings Using Speaker Diarization from a Single Microphone | H. Hung, Y. Huang, G. Friedland, and D. Gatica-Perez | IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 4, pp. 847–860 | May 2011 | Speech | |
| Java Visual Speech Components for Rapid Application Development of GUI based Speech Processing Applications | S. Steidl, K. Riedhammer, T. Bocklet, F. Hoenig, and E. Noeth | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 3257-3260 | August 2011 | Speech | |
| Comparing Different Flavors of Spectro-Temporal Features for ASR | B. T. Meyer, S. V. Ravuri, M. R. Schaedler, and N. Morgan | Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 1269-1272 | August 2011 | Speech | [PDF]
|
| The ICSI RT-09 Speaker Diarization System | G. Friedland, A. Janin, D. Imseng, X. Anguera, L. Gottlieb, M. Huijbregts, M. Knox, and O. Vinyals | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 371-381 | February 2012 | Speech | [PDF]
|
| Speaker Diarization | G. Friedland and F. Valente | In Multimodal Signal Processing: Human Interactions in Meetings, S. Reynals, H. Bourlard, J. Carletta, and A. Popescu-Belis, eds., Cambridge University Press | June 2012 | Speech | |
| Narrative Theme Navigation for Sitcoms Supported by Fan-Generated Scripts | G. Friedland, A. Janin, and L. Gottlieb | To appear in Multimedia Tools and Applications, Springer | 2012 | Speech | [PDF]
|
| Fast Speaker Diarization Using a High-Level Scripting Language | E. Gonina, G. Friedland, H. Cook, and K. Keutzer | Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011), Big Island, Hawaii | December 2011 | Speech | [PDF]
|
| On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval | R. Mertens, P.-S. Huang, L. Gottlieb, G. Friedland, and A. Divakaran | Proceedings of the IEEE International Symposium on Multimedia, Dana Point, California, pp. 446-451 | December 2011 | Speech | [PDF]
|
| Acoustic Super Models for Large Scale Video Event Detection | R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran | Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona | November 2011 | Speech | [PDF]
|
| Multimodal Location Estimation on Flickr Videos | G. Friedland, J. Choi, H. Lei, and A. Janin | Proceedings of the ACM International Workshop on Social Media (WSM11), Scottsdale, Arizona | November 2011 | Speech | [PDF]
|
| The 2011 ICSI Video Location Estimation System | J. Choi, H. Lei, and G. Friedland | Proceedings of the MediaEval 2011 Workshop, Pisa, Italy | September 2011 | Speech | [PDF]
|
| Review of A. Rahman, et al., "Spatial-Geometric Approach to Physical Mobile Interaction Based on Accelerometer and IR Sensory Data Fusion" | G. Friedland | ACM Computing Reviews, CR139264 | July 2011 | Speech | |
| Review of J. Ajmera, et al., "Two-Stream Indexing for Spoken Web Search" | G. Friedland | ACM Computing Reviews, CR139192 | June 2011 | Speech | |
| Review of C. Simon, et al., "Visual Event Recognition Using Decision Trees" | G. Friedland | ACM Computing Reviews, CR138638 | January 2011 | Speech | |
| Improving Automatic Speech Recognition by Learning from Human Errors | B. T. Meyer | Proceedings of the 162nd Meeting of the Acoustical Society of America, San Diego, California | October 2011 | Speech | |
| Don't Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech Recognition | D. Gillick, L. Gillick, and S. Wegmann | Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), Big Island, Hawaii | December 2011 | Speech | [PDF]
|
| Data-Driven vs. Semantic-Technology-Driven Tag-Based Video Location Estimation | J. Choi and G. Friedland | Proceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, California, pp. 243-246 | September 2011 | Speech | [PDF]
|
| Introduction to the Special Section on Deep Learning for Speech and Language Processing | D. Yu, G. Hinton, N. Morgan, J.-T. Chien, and S. Sagayama | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 4-6 | January 2012 | Speech | [PDF]
|
| Deep and Wide: Multiple Layers in Automatic Speech Recognition | N. Morgan | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 7-13 | January 2012 | Speech | [PDF]
|
| Multimodal City-Verification on Flickr Videos Using Acoustic and Textual Features | H. Lei, J. Choi, and G. Friedland | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| Spectro-Temporal Gabor Features for Speaker Recognition | H. Lei, B. T. Meyer, and N. Mirghafori | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| Discriminative Training for Speech Recognition is Compensating for Statistical Dependence on the HMM Framework | D. Gillick and S. Wegmann, L. Gillick | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|
| How to Put It Into Words - Using Random Forests to Extract Symbol Level Descriptions from Audio Content for Concept Detection | P.-S. Huang, R. Mertens, A. Divakaran, G. Friedland, and M. Hasegawa-Johns | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan | March 2012 | Speech | [PDF]
|