Publication Search Results

TitleAuthorBibliographicDatesort ascendingGroupLinks
Hearing is Believing: Biologically-Inspired Feature Extraction for Robust Automatic Speech RecognitionR. M. Stern and N. MorganSignal Processing Magazine, Vol. 29, No. 6, pp. 34-43November 2012Speech[PDF]

There is No Data Like Less Data: Percepts for Video Concept Detection on Consumer-Produced MediaBenjamin Elizalde; Gerald Friedland; Howard Lei; Ajay DivakaranProceedings of the ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis (AMVA) at ACM Multimedia 2012 (MM'12), Nara, Japan, pp. 27-32October 2012Speech[PDF]

Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-LocationL. Gottlieb, J. Choi, P. Kelm, T. Sikora, and G. FriedlandProceedings of the ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), held in conjunction with ACM Multimedia 2012, pp. 23-28, Nara, JapanOctober 2012Speech[PDF]

The 2012 ICSI/Berkeley Video Location Estimation SystemJ. Choi, V. Ekambaram, G. Friedland, and K. RamchandranPresented at the MediaEval 2012 Workshop, Pisa, ItalyOctober 2012Speech[PDF]

Where did I go Wrong?: Identifying Troublesome Segments for Speaker Diarization SystemsM. T. Knox, N. Mirghafori, and G. FriedlandProceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, OregonSeptember 2012Speech[PDF]

Hooking Up Spectro-Temporal Filters with Auditory-Inspired Representations for Robust Automatic Speech RecognitionB. Meyer, C. Spille, B. Kollmeier, and N. MorganProceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, OregonSeptember 2012Speech[PDF]

Longer Features: They Do a Speech Detector GoodTJ Tsai and N. MorganProceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, OregonSeptember 2012Speech
Multimodal Location Estimation of Consumer Media – Dealing with Sparse Training DataJ. Choi, G. Friedland, V. Ekambaram, and K. RamchandranProceedings of the IEEE International Conference on Multimedia and Expo, Melbourne, Australia, pp. 43-48July 2012Speech[PDF]

Speaker DiarizationG. Friedland and F. ValenteIn Multimodal Signal Processing: Human Interactions in Meetings, S. Reynals, H. Bourlard, J. Carletta, and A. Popescu-Belis, eds., Cambridge University PressJune 2012Speech
Semi-Autonomous Car Control Using Brain Computer InterfacesD. Goehring, D. Latotzky, M. Wang, and R. RojasProceedings of the 12th International Conference of Intelligent Autonomous Systems (IAS), Juju Island, KoreaJune 2012Speech
Cybercasing the Joint: Language Technologies, Multimedia Retrieval, and Online PrivacyG. FriedlandPresented at the Language Technologies Institute Colloquium, Carnegie Mellon University, Pittsburgh, PennsylvaniaApril 13 2012Speech[PDF]

Multimodal City-Verification on Flickr Videos Using Acoustic and Textual FeaturesH. Lei, J. Choi, and G. FriedlandProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech[PDF]

Spectro-Temporal Gabor Features for Speaker RecognitionH. Lei, B. T. Meyer, and N. MirghaforiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech[PDF]

Discriminative Training for Speech Recognition is Compensating for Statistical Dependence on the HMM FrameworkD. Gillick and S. Wegmann, L. GillickProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech[PDF]

How to Put It Into Words - Using Random Forests to Extract Symbol Level Descriptions from Audio Content for Concept DetectionP.-S. Huang, R. Mertens, A. Divakaran, G. Friedland, and M. Hasegawa-JohnsProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech[PDF]

Easy Does It: Robust Spectro-Temporal Many-Stream ASR Without Fine Tuning StreamsS. Ravuri and N. MorganProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech
Articulatory Features for Expressive Speech SynthesisA. Black, H. T. Bunnell, Y. Dou, P. Kumar, F. Metze, D. Perry, T. Polzehl, K. Prahallad, S. Steidl, and C. VaugProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, JapanMarch 2012Speech[PDF]

From AUDREY to Siri: Is Speech Recognition A Solved Problem?R. PieracciniPresented at the Mobile Voice Conference, San Francisco, CaliforniaMarch 2012Speech[PDF]

Special Section on New Frontiers in Rich TranscriptionG. Friedland, J. Fiscus, T. Hain, and S. Furui (eds)IEEE Transactions in Audio, Speech, and Language Processing, Vol. 20, No. 2February 2012Speech
Speaker Diarization: A Review of Recent ResearchX. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. VinyalsIEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 356-370February 2012Speech[PDF]

The ICSI RT-09 Speaker Diarization SystemG. Friedland, A. Janin, D. Imseng, X. Anguera, L. Gottlieb, M. Huijbregts, M. Knox, and O. VinyalsIEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 371-381February 2012Speech[PDF]

Introduction to the Special Section on Deep Learning for Speech and Language ProcessingD. Yu, G. Hinton, N. Morgan, J.-T. Chien, and S. SagayamaIEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 4-6January 2012Speech[PDF]

Deep and Wide: Multiple Layers in Automatic Speech RecognitionN. MorganIEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 7-13January 2012Speech[PDF]

Narrative Theme Navigation for Sitcoms Supported by Fan-Generated ScriptsG. Friedland, A. Janin, and L. GottliebTo appear in Multimedia Tools and Applications, Springer 2012Speech[PDF]

Syllable Models for Mandarin Speech Recognition: Exploiting Character Language ModelsX. Liu, J. L. Hieronymus, M. J. F. Gales, and P. C. WoodlandIn submission 2012Speech
Features Based on Auditory Physiology and PerceptionR. M. Stern and N. MorganIn Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Wiley Publishing 2012Speech
Fast Speaker Diarization Using a High-Level Scripting LanguageE. Gonina, G. Friedland, H. Cook, and K. KeutzerProceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011), Big Island, HawaiiDecember 2011Speech[PDF]

On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia RetrievalR. Mertens, P.-S. Huang, L. Gottlieb, G. Friedland, and A. DivakaranProceedings of the IEEE International Symposium on Multimedia, Dana Point, California, pp. 446-451December 2011Speech[PDF]

Don't Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech RecognitionD. Gillick, L. Gillick, and S. WegmannProceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), Big Island, HawaiiDecember 2011Speech[PDF]

Finding Difficult Speakers in Automatic Speaker RecognitionL. StollUC Berkeley PhD thesis, Berkeley, CaliforniaDecember 2011Speech[PDF]

Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 2nd EditionB. Gold, N. Morgan, and D. EllisWileyNovember 2011Speech
Video2GPS: A Demo of Multimodal Location Estimation on Flickr VideosG. Friedland, J. Choi, and A. JaninProceedings of the ACM Multimedia Conference (MM'11), Scottsdale, ArizonaNovember 2011Speech[PDF]

Acoustic Super Models for Large Scale Video Event DetectionR. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. DivakaranProceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, ArizonaNovember 2011Speech[PDF]

Multimodal Location Estimation on Flickr VideosG. Friedland, J. Choi, H. Lei, and A. JaninProceedings of the ACM International Workshop on Social Media (WSM11), Scottsdale, ArizonaNovember 2011Speech[PDF]

Improving Automatic Speech Recognition by Learning from Human ErrorsB. T. MeyerProceedings of the 162nd Meeting of the Acoustical Society of America, San Diego, CaliforniaOctober 2011Speech
Data-Driven vs. Semantic-Technology-Driven Tag-Based Video Location EstimationJ. Choi and G. FriedlandProceedings of the IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, California, pp. 243-246September 2011Speech[PDF]

The 2011 ICSI Video Location Estimation SystemJ. Choi, H. Lei, and G. FriedlandProceedings of the MediaEval 2011 Workshop, Pisa, ItalySeptember 2011Speech[PDF]

Data-Driven vs. Semantic-Technology-Driven Tag-Based Video Location EstimationJ. Choi and G. FriedlandProceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, California, pp. 243-246September 2011Speech[PDF]

Improved Overlapped Speech Handling for Speaker DiarizationK. Boakye, O. Vinyals, and G. FriedlandProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 941-944August 2011Speech
Data Selection with Kurtosis and Nasality features for Speaker RecognitionH. Lei and N. MirghaforiProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 2753-2756August 2011Speech[PDF]

Improved Classification of Speaking Styles for Mental Health Monitoring using Phoneme DynamicsK. Chang, H. Lei, and J. CannyProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 85-88August 2011Speech[PDF]

Effective Arabic Dialect Classification Using Diverse Phonotactic ModelsM. Akbacak, D. Vergyri, A. Stolcke, N. Scheffer, and A. MandalProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 737-740August 2011Speech[PDF]

Constrained Cepstral Speaker Recognition Using Matched UBM and JFA TrainingM. H. Sanchez, L. Ferrer, E. Shriberg, and A. StolckeProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 141-144August 2011Speech[PDF]

Java Visual Speech Components for Rapid Application Development of GUI based Speech Processing ApplicationsS. Steidl, K. Riedhammer, T. Bocklet, F. Hoenig, and E. NoethProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 3257-3260August 2011Speech
Comparing Different Flavors of Spectro-Temporal Features for ASRB. T. Meyer, S. V. Ravuri, M. R. Schaedler, and N. MorganProceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, pp. 1269-1272August 2011Speech[PDF]

Review of A. Rahman, et al., "Spatial-Geometric Approach to Physical Mobile Interaction Based on Accelerometer and IR Sensory Data Fusion"G. FriedlandACM Computing Reviews, CR139264July 2011Speech
How Good Is the Crowd at "Real" WSD?J. Hong and C. F. BakerProceedings of the Fifth Linguistic Annotation Workshop (LAW-V), Portland, OregonJune 2011Speech[PDF]

Review of J. Ajmera, et al., "Two-Stream Indexing for Spoken Web Search"G. FriedlandACM Computing Reviews, CR139192June 2011Speech
Gappy Phrasal Alignment by AgreementM. Bansal, C. Quirk, and R. C. MooreProceedings of the 49th annual Meeting of the Association for Computational Linguistics, pp. 1308-1317 Portland, OregonJune 2011Speech[PDF]

The Surprising Variance in Shortest-Derivation ParsingM. Bansal and D. KleinProceedings of the 49th annual Meeting of the Association for Computational Linguistics, Portland, OregonJune 2011Speech[PDF]

Pages